Spaces:
Sleeping
Sleeping
Working version 1: GOT OCR works with latex output
Browse files
README.md
CHANGED
@@ -31,10 +31,11 @@ Markit is a powerful tool that converts various document formats (PDF, DOCX, ima
|
|
31 |
- **PyPdfium**: Fast PDF parsing using the PDFium engine
|
32 |
- **Docling**: Advanced document structure analysis
|
33 |
- **Gemini Flash**: AI-powered conversion using Google's Gemini API
|
34 |
-
- **GOT-OCR**: State-of-the-art OCR model for images (JPG/PNG only)
|
35 |
- **OCR Integration**: Extract text from images and scanned documents using Tesseract OCR
|
36 |
- **Interactive UI**: User-friendly Gradio interface with page navigation for large documents
|
37 |
- **AI-Powered Chat**: Interact with your documents using AI to ask questions about content
|
|
|
38 |
|
39 |
## System Architecture
|
40 |
The application is built with a modular architecture:
|
@@ -85,14 +86,16 @@ The GOT-OCR parser requires:
|
|
85 |
1. CUDA-capable GPU with sufficient memory
|
86 |
2. The following dependencies will be installed automatically:
|
87 |
```bash
|
88 |
-
torch
|
89 |
-
torchvision
|
90 |
-
transformers
|
91 |
-
|
92 |
-
verovio
|
93 |
-
|
|
|
94 |
```
|
95 |
3. Note that GOT-OCR only supports JPG and PNG image formats
|
|
|
96 |
|
97 |
## Deploying to Hugging Face Spaces
|
98 |
|
@@ -126,6 +129,8 @@ build:
|
|
126 |
- **None**: No OCR processing (for documents with selectable text)
|
127 |
- **Tesseract**: Basic OCR using Tesseract
|
128 |
- **Advanced**: Enhanced OCR with layout preservation (available with specific parsers)
|
|
|
|
|
129 |
4. Select your desired output format:
|
130 |
- **Markdown**: Clean, readable markdown format
|
131 |
- **JSON**: Structured data representation
|
@@ -152,8 +157,11 @@ build:
|
|
152 |
- Verify that all required dependencies are installed correctly
|
153 |
- Remember that GOT-OCR only supports JPG and PNG image formats
|
154 |
- If you encounter CUDA out-of-memory errors, try using a smaller image
|
155 |
-
-
|
156 |
-
- If you see errors about
|
|
|
|
|
|
|
157 |
|
158 |
### General Issues
|
159 |
- Check the console logs for error messages
|
@@ -186,6 +194,7 @@ markit/
|
|
186 |
β β βββ parser_interface.py # Parser interface
|
187 |
β β βββ parser_registry.py # Parser registry
|
188 |
β β βββ docling_parser.py # Docling parser
|
|
|
189 |
β β βββ pypdfium_parser.py # PyPDFium parser
|
190 |
β βββ ui/ # User interface
|
191 |
β β βββ __init__.py # Package initialization
|
@@ -194,4 +203,14 @@ markit/
|
|
194 |
β βββ __init__.py # Package initialization
|
195 |
βββ tests/ # Tests
|
196 |
βββ __init__.py # Package initialization
|
197 |
-
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
- **PyPdfium**: Fast PDF parsing using the PDFium engine
|
32 |
- **Docling**: Advanced document structure analysis
|
33 |
- **Gemini Flash**: AI-powered conversion using Google's Gemini API
|
34 |
+
- **GOT-OCR**: State-of-the-art OCR model for images (JPG/PNG only) with plain text and formatted text options
|
35 |
- **OCR Integration**: Extract text from images and scanned documents using Tesseract OCR
|
36 |
- **Interactive UI**: User-friendly Gradio interface with page navigation for large documents
|
37 |
- **AI-Powered Chat**: Interact with your documents using AI to ask questions about content
|
38 |
+
- **ZeroGPU Support**: Optimized for Hugging Face Spaces with Stateless GPU environments
|
39 |
|
40 |
## System Architecture
|
41 |
The application is built with a modular architecture:
|
|
|
86 |
1. CUDA-capable GPU with sufficient memory
|
87 |
2. The following dependencies will be installed automatically:
|
88 |
```bash
|
89 |
+
torch
|
90 |
+
torchvision
|
91 |
+
git+https://github.com/huggingface/transformers.git@main # Latest transformers from GitHub
|
92 |
+
accelerate
|
93 |
+
verovio
|
94 |
+
numpy==1.26.3 # Specific version required
|
95 |
+
opencv-python
|
96 |
```
|
97 |
3. Note that GOT-OCR only supports JPG and PNG image formats
|
98 |
+
4. In HF Spaces, the integration with ZeroGPU is automatic and optimized for Stateless GPU environments
|
99 |
|
100 |
## Deploying to Hugging Face Spaces
|
101 |
|
|
|
129 |
- **None**: No OCR processing (for documents with selectable text)
|
130 |
- **Tesseract**: Basic OCR using Tesseract
|
131 |
- **Advanced**: Enhanced OCR with layout preservation (available with specific parsers)
|
132 |
+
- **Plain Text**: For GOT-OCR, extracts raw text without formatting
|
133 |
+
- **Formatted Text**: For GOT-OCR, preserves formatting and converts to Markdown
|
134 |
4. Select your desired output format:
|
135 |
- **Markdown**: Clean, readable markdown format
|
136 |
- **JSON**: Structured data representation
|
|
|
157 |
- Verify that all required dependencies are installed correctly
|
158 |
- Remember that GOT-OCR only supports JPG and PNG image formats
|
159 |
- If you encounter CUDA out-of-memory errors, try using a smaller image
|
160 |
+
- In Hugging Face Spaces with Stateless GPU, ensure the `spaces` module is imported before any CUDA initialization
|
161 |
+
- If you see errors about "CUDA must not be initialized in the main process", verify the import order in your app.py
|
162 |
+
- If you encounter "cannot pickle '_thread.lock' object" errors, this indicates thread locks are being passed to the GPU function
|
163 |
+
- The GOT-OCR parser has been optimized for ZeroGPU in Stateless GPU environments with proper serialization handling
|
164 |
+
- For local development, the parser will fall back to CPU processing if GPU is not available
|
165 |
|
166 |
### General Issues
|
167 |
- Check the console logs for error messages
|
|
|
194 |
β β βββ parser_interface.py # Parser interface
|
195 |
β β βββ parser_registry.py # Parser registry
|
196 |
β β βββ docling_parser.py # Docling parser
|
197 |
+
β β βββ got_ocr_parser.py # GOT-OCR parser for images
|
198 |
β β βββ pypdfium_parser.py # PyPDFium parser
|
199 |
β βββ ui/ # User interface
|
200 |
β β βββ __init__.py # Package initialization
|
|
|
203 |
β βββ __init__.py # Package initialization
|
204 |
βββ tests/ # Tests
|
205 |
βββ __init__.py # Package initialization
|
206 |
+
```
|
207 |
+
|
208 |
+
### ZeroGPU Integration Notes
|
209 |
+
|
210 |
+
When developing for Hugging Face Spaces with Stateless GPU:
|
211 |
+
|
212 |
+
1. Always import the `spaces` module before any CUDA initialization
|
213 |
+
2. Place all CUDA operations inside functions decorated with `@spaces.GPU()`
|
214 |
+
3. Ensure only picklable objects are passed to GPU-decorated functions
|
215 |
+
4. Use wrapper functions to filter out unpicklable objects like thread locks
|
216 |
+
5. For advanced use cases, consider implementing fallback mechanisms for serialization errors
|