Spaces:
Sleeping
Sleeping
title: Visual QNA | |
emoji: π | |
colorFrom: green | |
colorTo: indigo | |
sdk: streamlit | |
sdk_version: 1.43.2 | |
app_file: app.py | |
pinned: false | |
license: mit | |
short_description: Streamlit app for Visual QA using VILT model to answer image | |
# Image-Based Question Answering System | |
## Overview | |
This repository contains two projects: | |
1. **Complete Web Application** β A full-stack web app built using Streamlit for both frontend and backend. | |
2. **Flask API Backend** β A standalone Flask-based backend API. | |
Both implementations allow users to upload an image and ask questions about it. The system uses the **dandelin/vilt-b32-finetuned-vqa** model to analyze and respond to queries based on the provided image. | |
## Features | |
- Users can upload an image. | |
- Users can ask questions related to the uploaded image. | |
- The model processes the image and answers questions based on its content. | |
- Two implementations: | |
- **Streamlit Web App:** A complete frontend and backend application. | |
- **Flask API:** A RESTful API for backend processing. | |
## Technology Stack | |
- **Frontend:** Streamlit (for the web app UI) | |
- **Backend:** Flask (for the API) | |
- **Model:** `dandelin/vilt-b32-finetuned-vqa` | |
- **Libraries:** PyTorch, Transformers, Pillow, OpenCV, Requests | |
--- | |
## Live Demo | |
You can test the application live at: | |
[Visual QNA with image](https://huggingface.co/spaces/Tahir5/Visual-QNA) | |
## Installation & Setup | |
### 1. Clone the Repository | |
```bash | |
git clone https://github.com/your-repo/image-vqa.git | |
cd image-vqa | |
``` | |
### 2. Install Dependencies | |
```bash | |
pip install -r requirements.txt | |
``` | |
### 3. Run the Streamlit Web App | |
```bash | |
streamlit run stream.py | |
``` | |
### 4. Run the Flask API | |
```bash | |
python flask_app.py | |
``` | |
--- | |
## API Endpoints (For Flask Backend) | |
### 1. Visual Question Answering (VQA) | |
**Endpoint:** `POST /vqa` | |
- **Description:** Processes an image and a question, returning an answer. | |
- **Request Format:** Multipart form-data | |
- `image`: The uploaded image file. | |
- `question`: The question related to the image. | |
- **Response Format:** JSON | |
**Example Request (cURL):** | |
```bash | |
curl -X POST "http://127.0.0.1:5000/vqa" \ | |
-F "image=@path/to/image.jpg" \ | |
-F "question=What is in the image?" | |
``` | |
**Example Response:** | |
```json | |
{ | |
"question": "What is in the image?", | |
"answer": "A cat sitting on a table." | |
} | |
``` | |
--- | |
## Testing with Postman | |
### Steps to Test the Flask API in Postman | |
1. Open **Postman**. | |
2. Select **POST** request. | |
3. Enter the request URL: `http://127.0.0.1:5000/vqa`. | |
4. Navigate to the **Body** tab and select **form-data**. | |
5. Add two key-value pairs: | |
- **Key:** `image` β Select an image file. | |
- **Key:** `question` β Enter a text question related to the image. | |
6. Click **Send**. | |
7. View the response containing the model's answer in JSON format. | |
--- | |
## Example Usage | |
### Streamlit Web App | |
1. Open the app in the browser. | |
2. Upload an image. | |
3. Enter a question. | |
4. View the model's response. | |
### Flask API | |
1. Send a `POST` request to `/vqa` with an image and a question. | |
2. Receive the model-generated answer in JSON format. | |
--- | |
## Model Information | |
- **Name:** `dandelin/vilt-b32-finetuned-vqa` | |
- **Functionality:** Vision-and-Language Transformer (ViLT) model fine-tuned for Visual Question Answering (VQA). | |
- **Source:** [Hugging Face Model Hub](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa) | |
--- | |
## Contributing | |
Feel free to contribute by opening issues or submitting pull requests. | |
--- | |
## License | |
This project is licensed under the MIT License. | |