hellorahulk commited on
Commit
97c779b
·
1 Parent(s): 3985ce1
Files changed (10) hide show
  1. .cursorrules +1 -17
  2. .gitignore +0 -43
  3. .vercel/README.txt +11 -0
  4. .vercel/project.json +1 -0
  5. README.md +1 -155
  6. api.py +0 -230
  7. api/index.py +0 -201
  8. requirements-prod.txt +0 -11
  9. requirements.txt +13 -4
  10. vercel.json +0 -25
.cursorrules CHANGED
@@ -98,21 +98,5 @@ If needed, you can further use the `web_scraper.py` file to scrape the web page
98
  - Add debug information to stderr while keeping the main output clean in stdout for better pipeline integration
99
  - When using seaborn styles in matplotlib, use 'seaborn-v0_8' instead of 'seaborn' as the style name due to recent seaborn version changes
100
  - Use 'gpt-4o' as the model name for OpenAI's GPT-4 with vision capabilities
101
- - For Vercel deployments with FastAPI and Mangum, use older stable versions (FastAPI 0.88.0, Mangum 0.15.0, Pydantic 1.10.2) to avoid compatibility issues
102
- - Keep Vercel configuration simple and avoid unnecessary configuration options that might cause conflicts
103
 
104
- # Scratchpad
105
-
106
- Current Task: Fix Vercel deployment issues with FastAPI and Mangum
107
-
108
- Progress:
109
- [X] Identified issue with newer versions of FastAPI and Mangum
110
- [X] Updated dependencies to use older, stable versions
111
- [X] Simplified FastAPI configuration
112
- [X] Simplified Vercel configuration
113
- [X] Successfully deployed to production
114
-
115
- Next Steps:
116
- [ ] Test all API endpoints
117
- [ ] Add more functionality if needed
118
- [ ] Consider adding monitoring and logging
 
98
  - Add debug information to stderr while keeping the main output clean in stdout for better pipeline integration
99
  - When using seaborn styles in matplotlib, use 'seaborn-v0_8' instead of 'seaborn' as the style name due to recent seaborn version changes
100
  - Use 'gpt-4o' as the model name for OpenAI's GPT-4 with vision capabilities
 
 
101
 
102
+ # Scratchpad
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.gitignore DELETED
@@ -1,43 +0,0 @@
1
- # Python
2
- __pycache__/
3
- *.py[cod]
4
- *$py.class
5
- *.so
6
- .Python
7
- build/
8
- develop-eggs/
9
- dist/
10
- downloads/
11
- eggs/
12
- .eggs/
13
- lib/
14
- lib64/
15
- parts/
16
- sdist/
17
- var/
18
- wheels/
19
- *.egg-info/
20
- .installed.cfg
21
- *.egg
22
-
23
- # Virtual Environment
24
- venv/
25
- env/
26
- ENV/
27
-
28
- # IDE
29
- .idea/
30
- .vscode/
31
- *.swp
32
- *.swo
33
-
34
- # Vercel
35
- .vercel/
36
- .env
37
- .env.local
38
-
39
- # Temporary files
40
- *.tmp
41
- tmp/
42
- temp/
43
- .vercel
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
.vercel/README.txt ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ > Why do I have a folder named ".vercel" in my project?
2
+ The ".vercel" folder is created when you link a directory to a Vercel project.
3
+
4
+ > What does the "project.json" file contain?
5
+ The "project.json" file contains:
6
+ - The ID of the Vercel project that you linked ("projectId")
7
+ - The ID of the user or team your Vercel project is owned by ("orgId")
8
+
9
+ > Should I commit the ".vercel" folder?
10
+ No, you should not share the ".vercel" folder with anyone.
11
+ Upon creation, it will be automatically added to your ".gitignore" file.
.vercel/project.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"projectId":"prj_kf78qZ09e6hVKbBJmTbRQxdvMb23","orgId":"team_FMqHvtRPYOsRJleJqSK4Itlg"}
README.md CHANGED
@@ -54,158 +54,4 @@ Built with:
54
 
55
  ## 📝 License
56
 
57
- MIT License
58
-
59
- # Document Parser API
60
-
61
- A scalable FastAPI service for parsing various document formats (PDF, DOCX, TXT, HTML, Markdown) with automatic information extraction.
62
-
63
- ## Features
64
-
65
- - 📄 Multi-format support (PDF, DOCX, TXT, HTML, Markdown)
66
- - 🔄 Asynchronous processing with background tasks
67
- - 🌐 Support for both file uploads and URL inputs
68
- - 📊 Structured information extraction
69
- - 🔗 Webhook support for processing notifications
70
- - 🚀 Highly scalable architecture
71
- - 🛡️ Comprehensive error handling
72
- - 📝 Detailed logging
73
-
74
- ## Quick Start
75
-
76
- ### Prerequisites
77
-
78
- - Python 3.8+
79
- - pip (Python package manager)
80
-
81
- ### Installation
82
-
83
- 1. Clone the repository:
84
- ```bash
85
- git clone https://github.com/yourusername/document-parser-api.git
86
- cd document-parser-api
87
- ```
88
-
89
- 2. Install dependencies:
90
- ```bash
91
- pip install -r requirements.txt
92
- ```
93
-
94
- 3. Run the API server:
95
- ```bash
96
- python api.py
97
- ```
98
-
99
- The API will be available at `http://localhost:8000`
100
-
101
- ## API Documentation
102
-
103
- ### Endpoints
104
-
105
- #### 1. Parse Document from File Upload
106
- ```http
107
- POST /parse/file
108
- ```
109
- - Upload a document file for parsing
110
- - Optional callback URL for processing notification
111
- - Returns a job ID for status tracking
112
-
113
- #### 2. Parse Document from URL
114
- ```http
115
- POST /parse/url
116
- ```
117
- - Submit a document URL for parsing
118
- - Optional callback URL for processing notification
119
- - Returns a job ID for status tracking
120
-
121
- #### 3. Check Processing Status
122
- ```http
123
- GET /status/{job_id}
124
- ```
125
- - Get the current status of a parsing job
126
- - Returns processing status and results if completed
127
-
128
- #### 4. Health Check
129
- ```http
130
- GET /health
131
- ```
132
- - Check if the API is running and healthy
133
-
134
- ### Example Usage
135
-
136
- #### Parse File
137
- ```python
138
- import requests
139
-
140
- url = "http://localhost:8000/parse/file"
141
- files = {"file": open("document.pdf", "rb")}
142
- response = requests.post(url, files=files)
143
- print(response.json())
144
- ```
145
-
146
- #### Parse URL
147
- ```python
148
- import requests
149
-
150
- url = "http://localhost:8000/parse/url"
151
- data = {
152
- "url": "https://example.com/document.pdf",
153
- "callback_url": "https://your-callback-url.com/webhook"
154
- }
155
- response = requests.post(url, json=data)
156
- print(response.json())
157
- ```
158
-
159
- ## Error Handling
160
-
161
- The API implements comprehensive error handling:
162
-
163
- - Invalid file formats
164
- - Failed URL downloads
165
- - Processing errors
166
- - Invalid requests
167
- - Server errors
168
-
169
- All errors return appropriate HTTP status codes and detailed error messages.
170
-
171
- ## Scaling Considerations
172
-
173
- For production deployment, consider:
174
-
175
- 1. **Job Storage**: Replace in-memory storage with Redis or a database
176
- 2. **Load Balancing**: Deploy behind a load balancer
177
- 3. **Worker Processes**: Adjust number of workers based on load
178
- 4. **Rate Limiting**: Implement rate limiting for API endpoints
179
- 5. **Monitoring**: Add metrics collection and monitoring
180
-
181
- ## Development
182
-
183
- ### Running Tests
184
- ```bash
185
- pytest tests/
186
- ```
187
-
188
- ### Local Development
189
- ```bash
190
- uvicorn api:app --reload --port 8000
191
- ```
192
-
193
- ### API Documentation
194
- - Swagger UI: `http://localhost:8000/docs`
195
- - ReDoc: `http://localhost:8000/redoc`
196
-
197
- ## Contributing
198
-
199
- 1. Fork the repository
200
- 2. Create a feature branch
201
- 3. Commit your changes
202
- 4. Push to the branch
203
- 5. Create a Pull Request
204
-
205
- ## License
206
-
207
- This project is licensed under the MIT License - see the LICENSE file for details.
208
-
209
- ## Support
210
-
211
- For support, please open an issue in the GitHub repository or contact the maintainers.
 
54
 
55
  ## 📝 License
56
 
57
+ MIT License
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
api.py DELETED
@@ -1,230 +0,0 @@
1
- import os
2
- from fastapi import FastAPI, HTTPException, UploadFile, File, BackgroundTasks
3
- from fastapi.middleware.cors import CORSMiddleware
4
- from pydantic import BaseModel, HttpUrl
5
- import tempfile
6
- import requests
7
- from typing import Optional, List, Dict, Any
8
- from dockling_parser import DocumentParser
9
- from dockling_parser.exceptions import ParserError, UnsupportedFormatError
10
- from dockling_parser.types import ParsedDocument
11
- import logging
12
- import aiofiles
13
- import asyncio
14
- from urllib.parse import urlparse
15
- from mangum import Mangum
16
- import httpx
17
-
18
- # Configure logging
19
- logging.basicConfig(level=logging.INFO)
20
- logger = logging.getLogger(__name__)
21
-
22
- # Initialize FastAPI app
23
- app = FastAPI(
24
- title="Document Parser API",
25
- description="A scalable API for parsing various document formats",
26
- version="1.0.0"
27
- )
28
-
29
- # Add CORS middleware
30
- app.add_middleware(
31
- CORSMiddleware,
32
- allow_origins=["*"],
33
- allow_credentials=True,
34
- allow_methods=["*"],
35
- allow_headers=["*"],
36
- )
37
-
38
- # Initialize document parser
39
- parser = DocumentParser()
40
-
41
- class URLInput(BaseModel):
42
- url: HttpUrl
43
- callback_url: Optional[HttpUrl] = None
44
-
45
- class ErrorResponse(BaseModel):
46
- error: str
47
- detail: Optional[str] = None
48
- code: str
49
-
50
- class ParseResponse(BaseModel):
51
- job_id: str
52
- status: str
53
- result: Optional[ParsedDocument] = None
54
- error: Optional[str] = None
55
-
56
- # In-memory job storage (replace with Redis/DB in production)
57
- jobs = {}
58
-
59
- async def process_document_async(job_id: str, file_path: str, callback_url: Optional[str] = None):
60
- """Process document asynchronously"""
61
- try:
62
- # Update job status
63
- jobs[job_id] = {"status": "processing"}
64
-
65
- # Parse document
66
- result = parser.parse(file_path)
67
-
68
- # Update job with result
69
- jobs[job_id] = {
70
- "status": "completed",
71
- "result": result
72
- }
73
-
74
- # Call callback URL if provided
75
- if callback_url:
76
- try:
77
- await notify_callback(callback_url, job_id, result)
78
- except Exception as e:
79
- logger.error(f"Failed to notify callback URL: {str(e)}")
80
-
81
- except Exception as e:
82
- logger.error(f"Error processing document: {str(e)}")
83
- jobs[job_id] = {
84
- "status": "failed",
85
- "error": str(e)
86
- }
87
- finally:
88
- # Cleanup temporary file
89
- try:
90
- if os.path.exists(file_path):
91
- os.unlink(file_path)
92
- except Exception as e:
93
- logger.error(f"Error cleaning up file: {str(e)}")
94
-
95
- async def notify_callback(callback_url: str, job_id: str, result: ParsedDocument):
96
- """Notify callback URL with results"""
97
- try:
98
- async with httpx.AsyncClient() as client:
99
- await client.post(
100
- callback_url,
101
- json={
102
- "job_id": job_id,
103
- "result": result.dict()
104
- }
105
- )
106
- except Exception as e:
107
- logger.error(f"Failed to send callback: {str(e)}")
108
-
109
- @app.post("/parse/file", response_model=ParseResponse)
110
- async def parse_file(
111
- background_tasks: BackgroundTasks,
112
- file: UploadFile = File(...),
113
- callback_url: Optional[HttpUrl] = None
114
- ):
115
- """
116
- Parse a document from file upload
117
- """
118
- try:
119
- # Create temporary file in /tmp for Vercel
120
- suffix = os.path.splitext(file.filename)[1]
121
- tmp_dir = "/tmp" if os.path.exists("/tmp") else tempfile.gettempdir()
122
- tmp_path = os.path.join(tmp_dir, f"upload_{os.urandom(8).hex()}{suffix}")
123
-
124
- content = await file.read()
125
- with open(tmp_path, "wb") as f:
126
- f.write(content)
127
-
128
- # Generate job ID
129
- job_id = f"job_{len(jobs) + 1}"
130
-
131
- # Start background processing
132
- background_tasks.add_task(
133
- process_document_async,
134
- job_id,
135
- tmp_path,
136
- str(callback_url) if callback_url else None
137
- )
138
-
139
- return ParseResponse(
140
- job_id=job_id,
141
- status="queued"
142
- )
143
-
144
- except Exception as e:
145
- logger.error(f"Error handling file upload: {str(e)}")
146
- raise HTTPException(
147
- status_code=500,
148
- detail=str(e)
149
- )
150
-
151
- @app.post("/parse/url", response_model=ParseResponse)
152
- async def parse_url(input_data: URLInput, background_tasks: BackgroundTasks):
153
- """
154
- Parse a document from URL
155
- """
156
- try:
157
- # Download file
158
- async with httpx.AsyncClient() as client:
159
- response = await client.get(str(input_data.url), follow_redirects=True)
160
- response.raise_for_status()
161
-
162
- # Get filename from URL or use default
163
- filename = os.path.basename(urlparse(str(input_data.url)).path)
164
- if not filename:
165
- filename = "document.pdf"
166
-
167
- # Save to temporary file in /tmp for Vercel
168
- tmp_dir = "/tmp" if os.path.exists("/tmp") else tempfile.gettempdir()
169
- tmp_path = os.path.join(tmp_dir, f"download_{os.urandom(8).hex()}{os.path.splitext(filename)[1]}")
170
-
171
- with open(tmp_path, "wb") as f:
172
- f.write(response.content)
173
-
174
- # Generate job ID
175
- job_id = f"job_{len(jobs) + 1}"
176
-
177
- # Start background processing
178
- background_tasks.add_task(
179
- process_document_async,
180
- job_id,
181
- tmp_path,
182
- str(input_data.callback_url) if input_data.callback_url else None
183
- )
184
-
185
- return ParseResponse(
186
- job_id=job_id,
187
- status="queued"
188
- )
189
-
190
- except httpx.RequestError as e:
191
- logger.error(f"Error downloading file: {str(e)}")
192
- raise HTTPException(
193
- status_code=400,
194
- detail=f"Error downloading file: {str(e)}"
195
- )
196
- except Exception as e:
197
- logger.error(f"Error processing URL: {str(e)}")
198
- raise HTTPException(
199
- status_code=500,
200
- detail=str(e)
201
- )
202
-
203
- @app.get("/status/{job_id}", response_model=ParseResponse)
204
- async def get_status(job_id: str):
205
- """
206
- Get the status of a parsing job
207
- """
208
- if job_id not in jobs:
209
- raise HTTPException(
210
- status_code=404,
211
- detail="Job not found"
212
- )
213
-
214
- job = jobs[job_id]
215
- return ParseResponse(
216
- job_id=job_id,
217
- status=job["status"],
218
- result=job.get("result"),
219
- error=job.get("error")
220
- )
221
-
222
- @app.get("/health")
223
- async def health_check():
224
- """
225
- Health check endpoint
226
- """
227
- return {"status": "healthy"}
228
-
229
- # Handler for Vercel
230
- handler = Mangum(app, lifespan="off")
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
api/index.py DELETED
@@ -1,201 +0,0 @@
1
- import json
2
- import os
3
- import tempfile
4
- import magic
5
- import requests
6
- import datetime
7
-
8
- def is_valid_file(file_data):
9
- """Check if file type is allowed using python-magic"""
10
- try:
11
- mime = magic.from_buffer(file_data, mime=True)
12
- allowed_mimes = [
13
- 'application/pdf',
14
- 'text/plain',
15
- 'text/html',
16
- 'text/markdown',
17
- 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
18
- ]
19
- return mime in allowed_mimes
20
- except Exception:
21
- return False
22
-
23
- def download_file(url):
24
- """Download file from URL and save to temp file"""
25
- try:
26
- response = requests.get(url, stream=True, timeout=10)
27
- response.raise_for_status()
28
-
29
- # Get content type
30
- content_type = response.headers.get('content-type', '').split(';')[0]
31
- if content_type not in [
32
- 'application/pdf',
33
- 'text/plain',
34
- 'text/html',
35
- 'text/markdown',
36
- 'application/vnd.openxmlformats-officedocument.wordprocessingml.document'
37
- ]:
38
- raise ValueError(f"Unsupported content type: {content_type}")
39
-
40
- # Create temp file with proper extension
41
- ext = {
42
- 'application/pdf': '.pdf',
43
- 'text/plain': '.txt',
44
- 'text/html': '.html',
45
- 'text/markdown': '.md',
46
- 'application/vnd.openxmlformats-officedocument.wordprocessingml.document': '.docx'
47
- }.get(content_type, '')
48
-
49
- fd, temp_path = tempfile.mkstemp(suffix=ext)
50
- os.close(fd)
51
-
52
- # Download file
53
- with open(temp_path, 'wb') as f:
54
- for chunk in response.iter_content(chunk_size=8192):
55
- f.write(chunk)
56
-
57
- return temp_path, content_type
58
- except Exception as e:
59
- raise ValueError(f"Failed to download file: {str(e)}")
60
-
61
- def handle_root():
62
- return {
63
- "status": "ok",
64
- "message": "Document Processing API",
65
- "version": "1.0.0"
66
- }
67
-
68
- def handle_health():
69
- return {
70
- "status": "healthy",
71
- "timestamp": str(datetime.datetime.now(datetime.UTC))
72
- }
73
-
74
- def handle_parse_file(file_data):
75
- if not file_data:
76
- raise ValueError("No file provided")
77
-
78
- if not is_valid_file(file_data):
79
- raise ValueError("Invalid file type")
80
-
81
- fd, temp_path = tempfile.mkstemp()
82
- os.close(fd)
83
-
84
- try:
85
- with open(temp_path, 'wb') as f:
86
- f.write(file_data)
87
-
88
- return {
89
- "status": "success",
90
- "message": "File processed successfully",
91
- "metadata": {
92
- "size": os.path.getsize(temp_path),
93
- "mime_type": magic.from_file(temp_path, mime=True)
94
- }
95
- }
96
- finally:
97
- try:
98
- os.unlink(temp_path)
99
- except:
100
- pass
101
-
102
- def handle_parse_url(url):
103
- if not url:
104
- raise ValueError("No URL provided")
105
-
106
- if not url.startswith(('http://', 'https://')):
107
- raise ValueError("Invalid URL")
108
-
109
- temp_path, content_type = download_file(url)
110
- try:
111
- return {
112
- "status": "success",
113
- "message": "URL processed successfully",
114
- "metadata": {
115
- "url": url,
116
- "content_type": content_type,
117
- "size": os.path.getsize(temp_path)
118
- }
119
- }
120
- finally:
121
- try:
122
- os.unlink(temp_path)
123
- except:
124
- pass
125
-
126
- def handler(request, context):
127
- """Vercel serverless handler"""
128
-
129
- # Add CORS headers to all responses
130
- cors_headers = {
131
- "Access-Control-Allow-Origin": "*",
132
- "Access-Control-Allow-Headers": "Content-Type, Authorization",
133
- "Access-Control-Allow-Methods": "GET, POST, OPTIONS",
134
- "Content-Type": "application/json"
135
- }
136
-
137
- # Handle OPTIONS request for CORS
138
- if request.get('httpMethod') == 'OPTIONS':
139
- return {
140
- "statusCode": 204,
141
- "headers": cors_headers,
142
- "body": ""
143
- }
144
-
145
- try:
146
- # Get path and method
147
- path = request.get('path', '/').rstrip('/')
148
- method = request.get('httpMethod', 'GET')
149
-
150
- # Route request
151
- response_data = None
152
- if path == '' or path == '/':
153
- response_data = handle_root()
154
- elif path == '/health':
155
- response_data = handle_health()
156
- elif path == '/parse/file' and method == 'POST':
157
- file_data = request.get('body', '')
158
- if request.get('isBase64Encoded', False):
159
- import base64
160
- file_data = base64.b64decode(file_data)
161
- response_data = handle_parse_file(file_data)
162
- elif path == '/parse/url' and method == 'POST':
163
- try:
164
- body = json.loads(request.get('body', '{}'))
165
- except:
166
- raise ValueError("Invalid JSON")
167
- response_data = handle_parse_url(body.get('url'))
168
- else:
169
- return {
170
- "statusCode": 404,
171
- "headers": cors_headers,
172
- "body": json.dumps({
173
- "error": "Not Found",
174
- "details": f"Path {path} not found"
175
- })
176
- }
177
-
178
- return {
179
- "statusCode": 200,
180
- "headers": cors_headers,
181
- "body": json.dumps(response_data)
182
- }
183
-
184
- except ValueError as e:
185
- return {
186
- "statusCode": 400,
187
- "headers": cors_headers,
188
- "body": json.dumps({
189
- "error": "Bad Request",
190
- "details": str(e)
191
- })
192
- }
193
- except Exception as e:
194
- return {
195
- "statusCode": 500,
196
- "headers": cors_headers,
197
- "body": json.dumps({
198
- "error": "Internal Server Error",
199
- "details": str(e)
200
- })
201
- }
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
requirements-prod.txt DELETED
@@ -1,11 +0,0 @@
1
- docling>=0.2.0
2
- pydantic>=2.0.0
3
- python-magic>=0.4.27
4
- PyPDF2>=3.0.0
5
- beautifulsoup4>=4.12.0
6
- lxml>=4.9.0
7
- requests>=2.31.0
8
- fastapi>=0.104.0
9
- python-multipart>=0.0.6
10
- httpx>=0.25.0
11
- mangum>=0.17.0
 
 
 
 
 
 
 
 
 
 
 
 
requirements.txt CHANGED
@@ -1,4 +1,13 @@
1
- flask==2.0.1
2
- werkzeug==2.0.1
3
- python-magic==0.4.27
4
- requests==2.31.0
 
 
 
 
 
 
 
 
 
 
1
+ docling>=0.2.0
2
+ pydantic>=2.0.0
3
+ python-magic>=0.4.27
4
+ python-docx>=0.8.11
5
+ PyPDF2>=3.0.0
6
+ beautifulsoup4>=4.12.0
7
+ lxml>=4.9.0
8
+ gradio>=4.44.1
9
+ pandas>=1.5.0
10
+ huggingface-hub>=0.19.0
11
+ python-magic-bin>=0.4.14; platform_system == "Windows"
12
+ libmagic; platform_system == "Linux"
13
+ requests>=2.31.0
vercel.json DELETED
@@ -1,25 +0,0 @@
1
- {
2
- "version": 2,
3
- "builds": [
4
- {
5
- "src": "api/index.py",
6
- "use": "@vercel/python",
7
- "config": {
8
- "maxLambdaSize": "10mb",
9
- "runtime": "python3.9"
10
- }
11
- }
12
- ],
13
- "routes": [
14
- {
15
- "src": "/(.*)",
16
- "dest": "api/index.py",
17
- "headers": {
18
- "Access-Control-Allow-Origin": "*",
19
- "Access-Control-Allow-Headers": "Content-Type, Authorization",
20
- "Access-Control-Allow-Methods": "GET, POST, OPTIONS",
21
- "Access-Control-Max-Age": "86400"
22
- }
23
- }
24
- ]
25
- }