File size: 1,376 Bytes
0df5e58
 
35d97d7
0df5e58
 
 
35d97d7
0df5e58
 
 
 
 
 
 
 
 
 
 
 
 
 
15fdcff
 
0df5e58
 
15fdcff
0df5e58
 
 
15fdcff
0df5e58
 
 
 
 
 
 
15fdcff
0df5e58
15fdcff
0df5e58
 
 
 
 
15fdcff
0df5e58
15fdcff
0df5e58
 
 
 
 
15fdcff
0df5e58
15fdcff
97c779b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
---
title: Smart Document Parser
emoji: πŸ’»
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.13.0
app_file: app.py
pinned: false
---

# πŸ“„ Smart Document Parser

A powerful document parsing application that automatically extracts structured information from various document formats.

## πŸš€ Features

- **Multiple Format Support**: PDF, DOCX, TXT, HTML, and Markdown
- **Rich Information Extraction**:
  - Document content with preserved formatting
  - Comprehensive metadata
  - Section breakdown
  - Named entity recognition
- **Smart Processing**:
  - Automatic format detection
  - Confidence scoring
  - Error handling

## 🎯 How to Use

1. **Upload Document**: Click the upload button or drag & drop your document
2. **Process**: Click "Process Document"
3. **View Results**: Explore the extracted information in different tabs:
   - πŸ“ Content: Main document text
   - πŸ“Š Metadata: Document properties
   - πŸ“‘ Sections: Document structure
   - 🏷️ Entities: Named entities

## πŸ“‹ Supported Formats

- PDF Documents (*.pdf)
- Word Documents (*.docx)
- Text Files (*.txt)
- HTML Files (*.html)
- Markdown Files (*.md)

## πŸ› οΈ Technical Details

Built with:
- Docling: Advanced document processing
- Gradio: Interactive web interface
- Pydantic: Type-safe data handling
- Hugging Face Spaces: Cloud deployment

## πŸ“ License

MIT License