docling_free / README.md
hellorahulk's picture
roolback
97c779b
---
title: Smart Document Parser
emoji: πŸ’»
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.13.0
app_file: app.py
pinned: false
---
# πŸ“„ Smart Document Parser
A powerful document parsing application that automatically extracts structured information from various document formats.
## πŸš€ Features
- **Multiple Format Support**: PDF, DOCX, TXT, HTML, and Markdown
- **Rich Information Extraction**:
- Document content with preserved formatting
- Comprehensive metadata
- Section breakdown
- Named entity recognition
- **Smart Processing**:
- Automatic format detection
- Confidence scoring
- Error handling
## 🎯 How to Use
1. **Upload Document**: Click the upload button or drag & drop your document
2. **Process**: Click "Process Document"
3. **View Results**: Explore the extracted information in different tabs:
- πŸ“ Content: Main document text
- πŸ“Š Metadata: Document properties
- πŸ“‘ Sections: Document structure
- 🏷️ Entities: Named entities
## πŸ“‹ Supported Formats
- PDF Documents (*.pdf)
- Word Documents (*.docx)
- Text Files (*.txt)
- HTML Files (*.html)
- Markdown Files (*.md)
## πŸ› οΈ Technical Details
Built with:
- Docling: Advanced document processing
- Gradio: Interactive web interface
- Pydantic: Type-safe data handling
- Hugging Face Spaces: Cloud deployment
## πŸ“ License
MIT License