Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -1,150 +1,10 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
|
12 |
-
## Technologies Used
|
13 |
-
|
14 |
-
- **Python**: Backend logic.
|
15 |
-
- **Streamlit**: For building the web interface.
|
16 |
-
- **Huggingface Transformers**: For integrating OCR models (Qwen2-VL or GOT).
|
17 |
-
- **PyTorch**: For deep learning inference.
|
18 |
-
- **Pytesseract**: Optional OCR engine.
|
19 |
-
- **OpenCV**: For image preprocessing.
|
20 |
-
|
21 |
-
## Project Structure
|
22 |
-
|
23 |
-
```
|
24 |
-
DualTextOCRFusion/
|
25 |
-
β
|
26 |
-
βββ app.py # Main Streamlit application
|
27 |
-
βββ ocr.py # Handles OCR extraction using the selected model
|
28 |
-
βββ .gitignore # Files and directories to ignore in Git
|
29 |
-
βββ .streamlit/
|
30 |
-
β βββ config.toml # Streamlit theme configuration
|
31 |
-
βββ requirements.txt # Dependencies for the project
|
32 |
-
βββ README.md # This file
|
33 |
-
```
|
34 |
-
|
35 |
-
## How to Run Locally
|
36 |
-
|
37 |
-
### Prerequisites
|
38 |
-
|
39 |
-
- Python 3.8 or above installed on your machine.
|
40 |
-
- Tesseract installed for using `pytesseract` (optional if using Huggingface models). You can download Tesseract from [here](https://github.com/tesseract-ocr/tesseract).
|
41 |
-
|
42 |
-
### Steps
|
43 |
-
|
44 |
-
1. **Clone the Repository**:
|
45 |
-
|
46 |
-
```bash
|
47 |
-
git clone https://github.com/yourusername/dual-text-ocr-fusion.git
|
48 |
-
cd dual-text-ocr-fusion
|
49 |
-
```
|
50 |
-
|
51 |
-
2. **Install Dependencies**:
|
52 |
-
|
53 |
-
Make sure you have the required dependencies by running the following:
|
54 |
-
|
55 |
-
```bash
|
56 |
-
pip install -r requirements.txt
|
57 |
-
```
|
58 |
-
|
59 |
-
3. **Run the Application**:
|
60 |
-
|
61 |
-
Start the Streamlit app by running the following command:
|
62 |
-
|
63 |
-
```bash
|
64 |
-
streamlit run app.py
|
65 |
-
```
|
66 |
-
|
67 |
-
4. **Open the App**:
|
68 |
-
|
69 |
-
Once the server starts, the app will be available in your browser at:
|
70 |
-
|
71 |
-
```
|
72 |
-
http://localhost:8501
|
73 |
-
```
|
74 |
-
|
75 |
-
### Usage
|
76 |
-
|
77 |
-
1. **Upload an Image**: Upload an image containing Hindi and English text in formats like JPG, JPEG, or PNG.
|
78 |
-
2. **View Extracted Text**: The app will extract and display the text from the image.
|
79 |
-
3. **Search for Keywords**: Enter any keyword to search within the extracted text.
|
80 |
-
|
81 |
-
## Deployment
|
82 |
-
|
83 |
-
The app is deployed on **Streamlit Sharing** and can be accessed via the live URL:
|
84 |
-
|
85 |
-
**[Live Application](https://your-app-link.streamlit.app)**
|
86 |
-
|
87 |
-
## Customization
|
88 |
-
|
89 |
-
### Changing the OCR Model
|
90 |
-
|
91 |
-
By default, the app uses the **Qwen2-VL** model, but you can switch to the **General OCR Theory (GOT)** model by editing the `ocr.py` file.
|
92 |
-
|
93 |
-
- **For Qwen2-VL**:
|
94 |
-
|
95 |
-
```python
|
96 |
-
from ocr import extract_text_byaldi
|
97 |
-
```
|
98 |
-
|
99 |
-
- **For General OCR Theory (GOT)**:
|
100 |
-
|
101 |
-
```python
|
102 |
-
from ocr import extract_text_got
|
103 |
-
```
|
104 |
-
|
105 |
-
### Custom UI Theme
|
106 |
-
|
107 |
-
You can customize the look and feel of the application by modifying the `.streamlit/config.toml` file. Adjust colors, fonts, and layout options to suit your preferences.
|
108 |
-
|
109 |
-
## Example Images
|
110 |
-
|
111 |
-
Here are some sample images you can use to test the OCR functionality:
|
112 |
-
|
113 |
-
1. **Sample 1**: A document with mixed Hindi and English text.
|
114 |
-
2. **Sample 2**: An image with only Hindi text for multilingual OCR testing.
|
115 |
-
|
116 |
-
## Contributing
|
117 |
-
|
118 |
-
If you'd like to contribute to this project, feel free to fork the repository and submit a pull request. Follow these steps:
|
119 |
-
|
120 |
-
1. Fork the project.
|
121 |
-
2. Create a feature branch:
|
122 |
-
|
123 |
-
```bash
|
124 |
-
git checkout -b feature-branch
|
125 |
-
```
|
126 |
-
|
127 |
-
3. Commit your changes:
|
128 |
-
|
129 |
-
```bash
|
130 |
-
git commit -am 'Add new feature'
|
131 |
-
```
|
132 |
-
|
133 |
-
4. Push to the branch:
|
134 |
-
|
135 |
-
```bash
|
136 |
-
git push origin feature-branch
|
137 |
-
```
|
138 |
-
|
139 |
-
5. Open a pull request.
|
140 |
-
|
141 |
-
## License
|
142 |
-
|
143 |
-
This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
|
144 |
-
|
145 |
-
## Credits
|
146 |
-
|
147 |
-
- **Streamlit**: For the easy-to-use web interface.
|
148 |
-
- **Huggingface Transformers**: For the powerful OCR models.
|
149 |
-
- **Tesseract**: For optional OCR functionality.
|
150 |
-
- **ColPali & GOT Models**: For the multilingual OCR support.
|
|
|
1 |
+
---
|
2 |
+
title: "DualTextOCRFusion"
|
3 |
+
emoji: "π"
|
4 |
+
colorFrom: gray
|
5 |
+
colorTo: blue
|
6 |
+
sdk: streamlit
|
7 |
+
sdk_version: "{{2.1.1}}"
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|