--- title: Visual Question Answering (VQA) System emoji: 🏞️ colorFrom: blue colorTo: purple sdk: streamlit sdk_version: 1.43.1 app_file: app.py pinned: false --- # Visual Question Answering (VQA) System A multi-modal AI application that allows users to upload images and ask questions about them. This project uses pre-trained models from Hugging Face to analyze images and answer natural language questions. ## Features - Upload images in common formats (jpg, png, etc.) - Ask questions about image content in natural language - Get AI-generated answers based on image content - User-friendly Streamlit interface - Support for various types of questions (objects, attributes, counting, etc.) ## Technical Stack - **Python**: Main programming language - **PyTorch & Transformers**: Deep learning frameworks for running the models - **Streamlit**: Interactive web application framework - **HuggingFace Models**: Pre-trained visual question answering models - **PIL**: Image processing ## Setup Instructions 1. Clone this repository: ``` git clone cd visual-question-answering ``` 2. Create a virtual environment (recommended): ``` python -m venv venv # On Windows venv\Scripts\activate # On macOS/Linux source venv/bin/activate ``` 3. Install dependencies: ``` pip install -r requirements.txt ``` 4. Run the application: ``` python app.py ``` Or directly with Streamlit: ``` streamlit run app.py ``` 5. Open a web browser and go to `http://localhost:8501` ## Usage 1. Upload an image using the file upload area 2. Type your question about the image in the text field 3. Select a model from the sidebar (BLIP or ViLT) 4. Click "Get Answer" to get an AI-generated response 5. View the answer displayed on the right side of the screen ## Models Used This application uses the following pre-trained models from Hugging Face: - **BLIP**: For general visual question answering with free-form answers - **ViLT**: For detailed understanding of image content and yes/no questions ## Project Structure - `models/`: Contains model handling code - `utils/`: Utility functions for image processing and more - `static/`: Static files including uploaded images - `app.py`: Script to run the application - ## Acknowledgments - Hugging Face for their excellent pre-trained models - The open-source community for various libraries used in this project