Spaces:
Running
Running
title: Smart Document Parser | |
emoji: π» | |
colorFrom: blue | |
colorTo: indigo | |
sdk: gradio | |
sdk_version: 5.13.0 | |
app_file: app.py | |
pinned: false | |
# π Smart Document Parser | |
A powerful document parsing application that automatically extracts structured information from various document formats. | |
## π Features | |
- **Multiple Format Support**: PDF, DOCX, TXT, HTML, and Markdown | |
- **Rich Information Extraction**: | |
- Document content with preserved formatting | |
- Comprehensive metadata | |
- Section breakdown | |
- Named entity recognition | |
- **Smart Processing**: | |
- Automatic format detection | |
- Confidence scoring | |
- Error handling | |
## π― How to Use | |
1. **Upload Document**: Click the upload button or drag & drop your document | |
2. **Process**: Click "Process Document" | |
3. **View Results**: Explore the extracted information in different tabs: | |
- π Content: Main document text | |
- π Metadata: Document properties | |
- π Sections: Document structure | |
- π·οΈ Entities: Named entities | |
## π Supported Formats | |
- PDF Documents (*.pdf) | |
- Word Documents (*.docx) | |
- Text Files (*.txt) | |
- HTML Files (*.html) | |
- Markdown Files (*.md) | |
## π οΈ Technical Details | |
Built with: | |
- Docling: Advanced document processing | |
- Gradio: Interactive web interface | |
- Pydantic: Type-safe data handling | |
- Hugging Face Spaces: Cloud deployment | |
## π License | |
MIT License |