Spaces:
Running
Running
Update README.md
Browse files
README.md
CHANGED
@@ -11,3 +11,100 @@ short_description: Technical Assessment
|
|
11 |
---
|
12 |
|
13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
11 |
---
|
12 |
|
13 |
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
|
14 |
+
|
15 |
+
# OCR LLM Classifier
|
16 |
+
|
17 |
+
This project provides a simple interface for Optical Character Recognition (OCR) and spam classification using deep learning models. It supports three OCR methods (PaddleOCR, EasyOCR, and KerasOCR) and uses a DistilBERT model for classifying the extracted text as "Spam" or "Not Spam."
|
18 |
+
|
19 |
+
## Features
|
20 |
+
- Extract text from images using OCR.
|
21 |
+
- Classify extracted text as either "Spam" or "Not Spam."
|
22 |
+
- Save the extracted text and classification results to a local JSON and CSV file.
|
23 |
+
|
24 |
+
## How It Works
|
25 |
+
1. **OCR**: The app uses one of the three OCR methods to extract text from the uploaded image:
|
26 |
+
- **PaddleOCR**
|
27 |
+
- **EasyOCR**
|
28 |
+
- **KerasOCR**
|
29 |
+
|
30 |
+
2. **Classification**: The extracted text is passed to a pre-trained DistilBERT model that classifies the text as either "Spam" or "Not Spam."
|
31 |
+
|
32 |
+
3. **Save Results**: The extracted text and classification results are saved locally in both JSON and CSV formats, allowing easy retrieval and review.
|
33 |
+
|
34 |
+
## Installation
|
35 |
+
|
36 |
+
To get started with this project, follow these steps:
|
37 |
+
|
38 |
+
### 1. Clone the Repository
|
39 |
+
```bash
|
40 |
+
git clone https://github.com/yourusername/ocr-llm-test.git
|
41 |
+
cd ocr-llm-test
|
42 |
+
```
|
43 |
+
|
44 |
+
### 2. Install Dependencies
|
45 |
+
You can install the required dependencies using pip:
|
46 |
+
|
47 |
+
```bash
|
48 |
+
pip install -r requirements.txt
|
49 |
+
```
|
50 |
+
|
51 |
+
### 3. Run the App
|
52 |
+
To run the Gradio interface locally, execute:
|
53 |
+
|
54 |
+
```bash
|
55 |
+
python app.py
|
56 |
+
```
|
57 |
+
|
58 |
+
Once the app is running, it will be accessible through your web browser at [http://localhost:7860](http://localhost:7860).
|
59 |
+
|
60 |
+
## API Documentation
|
61 |
+
|
62 |
+
### 1. API Endpoint
|
63 |
+
|
64 |
+
The main endpoint for this API is `/predict`.
|
65 |
+
|
66 |
+
### 2. API Call Example
|
67 |
+
|
68 |
+
#### Install the Python Client
|
69 |
+
If you don't already have it installed, run the following command:
|
70 |
+
|
71 |
+
```bash
|
72 |
+
pip install gradio_client
|
73 |
+
```
|
74 |
+
|
75 |
+
#### Make an API Call
|
76 |
+
|
77 |
+
```python
|
78 |
+
from gradio_client import Client, handle_file
|
79 |
+
|
80 |
+
client = Client("winamnd/ocr-llm-test")
|
81 |
+
result = client.predict(
|
82 |
+
method="PaddleOCR",
|
83 |
+
img=handle_file('https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/bus.png'),
|
84 |
+
api_name="/predict"
|
85 |
+
)
|
86 |
+
print(result)
|
87 |
+
```
|
88 |
+
|
89 |
+
### 3. Parameters
|
90 |
+
|
91 |
+
| Parameter | Type | Description |
|
92 |
+
|-----------|------|-------------|
|
93 |
+
| `method` | `Literal['PaddleOCR', 'EasyOCR', 'KerasOCR']` | Choose the OCR method to be used for text extraction. Default is "PaddleOCR." |
|
94 |
+
| `img` | `dict` | The image input, which can be provided as a URL, path, or base64 encoded image. |
|
95 |
+
|
96 |
+
#### Image Input Details
|
97 |
+
- **path**: Path to a local file.
|
98 |
+
- **url**: Publicly available URL for the image.
|
99 |
+
- **size**: The size of the image (in bytes).
|
100 |
+
- **orig_name**: Original filename.
|
101 |
+
- **mime_type**: MIME type of the image.
|
102 |
+
- **is_stream**: Always set to False.
|
103 |
+
- **meta**: Metadata.
|
104 |
+
|
105 |
+
### 4. Returns
|
106 |
+
The API returns a tuple with two elements:
|
107 |
+
|
108 |
+
- **Extracted Text (`str`)**: The text extracted from the image.
|
109 |
+
- **Spam Classification (`str`)**: The classification result ("Spam" or "Not Spam").
|
110 |
+
|