winamnd commited on
Commit
37d5823
·
verified ·
1 Parent(s): 032c5a4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +97 -0
README.md CHANGED
@@ -11,3 +11,100 @@ short_description: Technical Assessment
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
14
+
15
+ # OCR LLM Classifier
16
+
17
+ This project provides a simple interface for Optical Character Recognition (OCR) and spam classification using deep learning models. It supports three OCR methods (PaddleOCR, EasyOCR, and KerasOCR) and uses a DistilBERT model for classifying the extracted text as "Spam" or "Not Spam."
18
+
19
+ ## Features
20
+ - Extract text from images using OCR.
21
+ - Classify extracted text as either "Spam" or "Not Spam."
22
+ - Save the extracted text and classification results to a local JSON and CSV file.
23
+
24
+ ## How It Works
25
+ 1. **OCR**: The app uses one of the three OCR methods to extract text from the uploaded image:
26
+ - **PaddleOCR**
27
+ - **EasyOCR**
28
+ - **KerasOCR**
29
+
30
+ 2. **Classification**: The extracted text is passed to a pre-trained DistilBERT model that classifies the text as either "Spam" or "Not Spam."
31
+
32
+ 3. **Save Results**: The extracted text and classification results are saved locally in both JSON and CSV formats, allowing easy retrieval and review.
33
+
34
+ ## Installation
35
+
36
+ To get started with this project, follow these steps:
37
+
38
+ ### 1. Clone the Repository
39
+ ```bash
40
+ git clone https://github.com/yourusername/ocr-llm-test.git
41
+ cd ocr-llm-test
42
+ ```
43
+
44
+ ### 2. Install Dependencies
45
+ You can install the required dependencies using pip:
46
+
47
+ ```bash
48
+ pip install -r requirements.txt
49
+ ```
50
+
51
+ ### 3. Run the App
52
+ To run the Gradio interface locally, execute:
53
+
54
+ ```bash
55
+ python app.py
56
+ ```
57
+
58
+ Once the app is running, it will be accessible through your web browser at [http://localhost:7860](http://localhost:7860).
59
+
60
+ ## API Documentation
61
+
62
+ ### 1. API Endpoint
63
+
64
+ The main endpoint for this API is `/predict`.
65
+
66
+ ### 2. API Call Example
67
+
68
+ #### Install the Python Client
69
+ If you don't already have it installed, run the following command:
70
+
71
+ ```bash
72
+ pip install gradio_client
73
+ ```
74
+
75
+ #### Make an API Call
76
+
77
+ ```python
78
+ from gradio_client import Client, handle_file
79
+
80
+ client = Client("winamnd/ocr-llm-test")
81
+ result = client.predict(
82
+ method="PaddleOCR",
83
+ img=handle_file('https://raw.githubusercontent.com/gradio-app/gradio/main/test/test_files/bus.png'),
84
+ api_name="/predict"
85
+ )
86
+ print(result)
87
+ ```
88
+
89
+ ### 3. Parameters
90
+
91
+ | Parameter | Type | Description |
92
+ |-----------|------|-------------|
93
+ | `method` | `Literal['PaddleOCR', 'EasyOCR', 'KerasOCR']` | Choose the OCR method to be used for text extraction. Default is "PaddleOCR." |
94
+ | `img` | `dict` | The image input, which can be provided as a URL, path, or base64 encoded image. |
95
+
96
+ #### Image Input Details
97
+ - **path**: Path to a local file.
98
+ - **url**: Publicly available URL for the image.
99
+ - **size**: The size of the image (in bytes).
100
+ - **orig_name**: Original filename.
101
+ - **mime_type**: MIME type of the image.
102
+ - **is_stream**: Always set to False.
103
+ - **meta**: Metadata.
104
+
105
+ ### 4. Returns
106
+ The API returns a tuple with two elements:
107
+
108
+ - **Extracted Text (`str`)**: The text extracted from the image.
109
+ - **Spam Classification (`str`)**: The classification result ("Spam" or "Not Spam").
110
+