Tahir5 commited on
Commit
2934ee7
·
verified ·
1 Parent(s): 061f436

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -1
README.md CHANGED
@@ -11,4 +11,126 @@ license: mit
11
  short_description: Streamlit app for Visual QA using VILT model to answer image
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  short_description: Streamlit app for Visual QA using VILT model to answer image
12
  ---
13
 
14
+ # Image-Based Question Answering System
15
+
16
+ ## Overview
17
+ This repository contains two projects:
18
+ 1. **Complete Web Application** – A full-stack web app built using Streamlit for both frontend and backend.
19
+ 2. **Flask API Backend** – A standalone Flask-based backend API.
20
+
21
+ Both implementations allow users to upload an image and ask questions about it. The system uses the **dandelin/vilt-b32-finetuned-vqa** model to analyze and respond to queries based on the provided image.
22
+
23
+ ## Features
24
+ - Users can upload an image.
25
+ - Users can ask questions related to the uploaded image.
26
+ - The model processes the image and answers questions based on its content.
27
+ - Two implementations:
28
+ - **Streamlit Web App:** A complete frontend and backend application.
29
+ - **Flask API:** A RESTful API for backend processing.
30
+
31
+ ## Technology Stack
32
+ - **Frontend:** Streamlit (for the web app UI)
33
+ - **Backend:** Flask (for the API)
34
+ - **Model:** `dandelin/vilt-b32-finetuned-vqa`
35
+ - **Libraries:** PyTorch, Transformers, Pillow, OpenCV, Requests
36
+
37
+ ---
38
+
39
+
40
+ ## Live Demo
41
+ You can test the application live at:
42
+ [Visual QNA with image](https://huggingface.co/spaces/Tahir5/Visual-QNA)
43
+
44
+
45
+ ## Installation & Setup
46
+ ### 1. Clone the Repository
47
+ ```bash
48
+ git clone https://github.com/your-repo/image-vqa.git
49
+ cd image-vqa
50
+ ```
51
+
52
+ ### 2. Install Dependencies
53
+ ```bash
54
+ pip install -r requirements.txt
55
+ ```
56
+
57
+ ### 3. Run the Streamlit Web App
58
+ ```bash
59
+ streamlit run stream.py
60
+ ```
61
+
62
+ ### 4. Run the Flask API
63
+ ```bash
64
+ python flask_app.py
65
+ ```
66
+
67
+ ---
68
+
69
+ ## API Endpoints (For Flask Backend)
70
+ ### 1. Visual Question Answering (VQA)
71
+ **Endpoint:** `POST /vqa`
72
+ - **Description:** Processes an image and a question, returning an answer.
73
+ - **Request Format:** Multipart form-data
74
+ - `image`: The uploaded image file.
75
+ - `question`: The question related to the image.
76
+ - **Response Format:** JSON
77
+
78
+ **Example Request (cURL):**
79
+ ```bash
80
+ curl -X POST "http://127.0.0.1:5000/vqa" \
81
+ -F "image=@path/to/image.jpg" \
82
+ -F "question=What is in the image?"
83
+ ```
84
+
85
+ **Example Response:**
86
+ ```json
87
+ {
88
+ "question": "What is in the image?",
89
+ "answer": "A cat sitting on a table."
90
+ }
91
+ ```
92
+
93
+ ---
94
+
95
+ ## Testing with Postman
96
+ ### Steps to Test the Flask API in Postman
97
+ 1. Open **Postman**.
98
+ 2. Select **POST** request.
99
+ 3. Enter the request URL: `http://127.0.0.1:5000/vqa`.
100
+ 4. Navigate to the **Body** tab and select **form-data**.
101
+ 5. Add two key-value pairs:
102
+ - **Key:** `image` → Select an image file.
103
+ - **Key:** `question` → Enter a text question related to the image.
104
+ 6. Click **Send**.
105
+ 7. View the response containing the model's answer in JSON format.
106
+
107
+ ---
108
+
109
+ ## Example Usage
110
+ ### Streamlit Web App
111
+ 1. Open the app in the browser.
112
+ 2. Upload an image.
113
+ 3. Enter a question.
114
+ 4. View the model's response.
115
+
116
+ ### Flask API
117
+ 1. Send a `POST` request to `/vqa` with an image and a question.
118
+ 2. Receive the model-generated answer in JSON format.
119
+
120
+ ---
121
+
122
+ ## Model Information
123
+ - **Name:** `dandelin/vilt-b32-finetuned-vqa`
124
+ - **Functionality:** Vision-and-Language Transformer (ViLT) model fine-tuned for Visual Question Answering (VQA).
125
+ - **Source:** [Hugging Face Model Hub](https://huggingface.co/dandelin/vilt-b32-finetuned-vqa)
126
+
127
+ ---
128
+
129
+ ## Contributing
130
+ Feel free to contribute by opening issues or submitting pull requests.
131
+
132
+ ---
133
+
134
+ ## License
135
+ This project is licensed under the MIT License.
136
+