LuisMBA's picture
Update README.md
993f74e verified

A newer version of the Gradio SDK is available: 5.29.0

Upgrade
metadata
title: Multimodal RAG Kaggle Based
emoji: 👁
colorFrom: red
colorTo: pink
sdk: gradio
sdk_version: 5.25.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Multimodal RAG to augment english recipes searches

Multimodal Retrieval System with FAISS

This repository contains a prototype system for multimodal information retrieval using FAISS, capable of searching across text and images using vector similarity.

Structure

  • notebook/ (or .ipynb): Contains the logic to generate the vector indexes for both text and images.
  • app.py: Gradio-based interface for interacting with the system.
  • search_ocean.py: Core logic for performing FAISS-based similarity search using precomputed indexes.
  • text_index.faiss, image_index.faiss: The FAISS index files generated by the notebook (already included in the app).
  • metadata_text.json, metadata_image.json: Associated metadata for mapping index results back to source information.

What it does

  • Loads precomputed FAISS indexes (for text and image).
  • Performs retrieval based on a text or image query.
  • Returns top matching results using cosine similarity.

What it doesn't (yet) do

  • No generation step (e.g., using LLMs) is implemented in this app.
  • While the code for image retrieval is ready, image indexes must be built in the notebook beforehand.
  • There is no context overlap implemented when chunking the data for indexing. Each chunk is indexed independently, which may affect the quality of retrieval in some use cases.

Dependencies

  • faiss-cpu
  • sentence-transformers
  • openai-clip
  • torch
  • torchvision
  • gradio
  • Pillow

Notes

  • The app is designed to separate concerns between indexing (offline, notebook) and retrieval (live, Gradio app).
  • You can easily extend this to include LLM generation or contextual QA once relevant results are retrieved.