metadata

title: Multimodal RAG Kaggle Based
emoji: 👁
colorFrom: red
colorTo: pink
sdk: gradio
sdk_version: 5.25.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Multimodal RAG to augment english recipes searches

Multimodal Retrieval System with FAISS

This repository contains a prototype system for multimodal information retrieval using FAISS, capable of searching across text and images using vector similarity.

Structure

notebook/ (or .ipynb): Contains the logic to generate the vector indexes for both text and images.
app.py: Gradio-based interface for interacting with the system.
search_ocean.py: Core logic for performing FAISS-based similarity search using precomputed indexes.
text_index.faiss, image_index.faiss: The FAISS index files generated by the notebook (already included in the app).
metadata_text.json, metadata_image.json: Associated metadata for mapping index results back to source information.

What it does

Loads precomputed FAISS indexes (for text and image).
Performs retrieval based on a text or image query.
Returns top matching results using cosine similarity.

What it doesn't (yet) do

No generation step (e.g., using LLMs) is implemented in this app.
While the code for image retrieval is ready, image indexes must be built in the notebook beforehand.
There is no context overlap implemented when chunking the data for indexing. Each chunk is indexed independently, which may affect the quality of retrieval in some use cases.

Dependencies

faiss-cpu
sentence-transformers
openai-clip
torch
torchvision
gradio
Pillow

Notes

The app is designed to separate concerns between indexing (offline, notebook) and retrieval (live, Gradio app).
You can easily extend this to include LLM generation or contextual QA once relevant results are retrieved.