metadata

title: Visual QNA
emoji: 🐠
colorFrom: green
colorTo: indigo
sdk: streamlit
sdk_version: 1.43.2
app_file: app.py
pinned: false
license: mit
short_description: Streamlit app for Visual QA using VILT model to answer image

Image-Based Question Answering System

Overview

This repository contains two projects:

Complete Web Application – A full-stack web app built using Streamlit for both frontend and backend.
Flask API Backend – A standalone Flask-based backend API.

Both implementations allow users to upload an image and ask questions about it. The system uses the dandelin/vilt-b32-finetuned-vqa model to analyze and respond to queries based on the provided image.

Features

Users can upload an image.
Users can ask questions related to the uploaded image.
The model processes the image and answers questions based on its content.
Two implementations:
- Streamlit Web App: A complete frontend and backend application.
- Flask API: A RESTful API for backend processing.

Technology Stack

Frontend: Streamlit (for the web app UI)
Backend: Flask (for the API)
Model: dandelin/vilt-b32-finetuned-vqa
Libraries: PyTorch, Transformers, Pillow, OpenCV, Requests

Live Demo

You can test the application live at:
Visual QNA with image

Installation & Setup

1. Clone the Repository

git clone https://github.com/your-repo/image-vqa.git
cd image-vqa

2. Install Dependencies

pip install -r requirements.txt

3. Run the Streamlit Web App

streamlit run stream.py

4. Run the Flask API

python flask_app.py

API Endpoints (For Flask Backend)

1. Visual Question Answering (VQA)

Endpoint: POST /vqa

Description: Processes an image and a question, returning an answer.
Request Format: Multipart form-data
- image: The uploaded image file.
- question: The question related to the image.
Response Format: JSON

Example Request (cURL):

curl -X POST "http://127.0.0.1:5000/vqa" \
     -F "image=@path/to/image.jpg" \
     -F "question=What is in the image?"

Example Response:

{
  "question": "What is in the image?",
  "answer": "A cat sitting on a table."
}

Testing with Postman

Steps to Test the Flask API in Postman

Open Postman.
Select POST request.
Enter the request URL: http://127.0.0.1:5000/vqa.
Navigate to the Body tab and select form-data.
Add two key-value pairs:
- Key: image → Select an image file.
- Key: question → Enter a text question related to the image.
Click Send.
View the response containing the model's answer in JSON format.

Example Usage

Streamlit Web App

Open the app in the browser.
Upload an image.
Enter a question.
View the model's response.

Flask API

Send a POST request to /vqa with an image and a question.
Receive the model-generated answer in JSON format.

Model Information

Name: dandelin/vilt-b32-finetuned-vqa
Functionality: Vision-and-Language Transformer (ViLT) model fine-tuned for Visual Question Answering (VQA).
Source: Hugging Face Model Hub

Contributing

Feel free to contribute by opening issues or submitting pull requests.

License

This project is licensed under the MIT License.