|
--- |
|
title: PDF to Markdown Converter |
|
emoji: π |
|
colorFrom: blue |
|
colorTo: green |
|
sdk: streamlit |
|
sdk_version: "1.29.0" |
|
app_file: app.py |
|
pinned: false |
|
--- |
|
|
|
# PDF to Markdown Converter |
|
|
|
This application converts PDF documents to Markdown format. It uses the `docling` library for document conversion and provides a simple Streamlit interface. |
|
|
|
## Features |
|
|
|
- Upload PDF files directly |
|
- Convert PDFs from URLs |
|
- Batch process multiple images using vLLM |
|
- Download the resulting Markdown files |
|
- Clean, user-friendly interface |
|
|
|
## How to Use |
|
|
|
### PDF to Markdown |
|
1. Select the "PDF to Markdown" tab |
|
2. Upload a PDF file using the file uploader or enter a URL to a PDF document |
|
3. Click the "Convert to Markdown" button |
|
4. Once conversion is complete, download the Markdown file |
|
|
|
### Batch Image Processing |
|
1. Select the "Batch Image Processing" tab |
|
2. Upload multiple image files (PNG, JPG, JPEG) |
|
3. Optionally customize the model path and prompt text |
|
4. Click the "Process Images" button |
|
5. Once processing is complete, download the ZIP file containing all results |
|
|
|
## Technical Details |
|
|
|
Built with: |
|
- Streamlit 1.29.0 |
|
- Docling 2.7.0 |
|
- docling_core |
|
- vLLM (for batch processing) |
|
- Python 3.12 |
|
|
|
## Deployment |
|
|
|
This application is deployed on Hugging Face Spaces. |
|
|
|
To deploy this application: |
|
1. Create a new Space on Hugging Face (https://huggingface.co/spaces) |
|
2. Choose "Streamlit" as the SDK |
|
3. Upload all these files to the Space repository: |
|
- app.py |
|
- requirements.txt |
|
- README.md |
|
- runtime.txt |
|
|
|
The application will automatically create any necessary directories when it starts. |
|
|
|
Note: The vLLM functionality requires significant computational resources, so you may need to select a more powerful hardware configuration for your Space. |