Spaces:

hellorahulk
/

docling_free

Running

docling_free / README.md

roolback

97c779b 3 months ago

1.38 kB

	---
	title: Smart Document Parser
	emoji: 💻
	colorFrom: blue
	colorTo: indigo
	sdk: gradio
	sdk_version: 5.13.0
	app_file: app.py
	pinned: false
	---

	# 📄 Smart Document Parser

	A powerful document parsing application that automatically extracts structured information from various document formats.

	## 🚀 Features

	- Multiple Format Support: PDF, DOCX, TXT, HTML, and Markdown
	- Rich Information Extraction:
	- Document content with preserved formatting
	- Comprehensive metadata
	- Section breakdown
	- Named entity recognition
	- Smart Processing:
	- Automatic format detection
	- Confidence scoring
	- Error handling

	## 🎯 How to Use

	1. Upload Document: Click the upload button or drag & drop your document
	2. Process: Click "Process Document"
	3. View Results: Explore the extracted information in different tabs:
	- 📝 Content: Main document text
	- 📊 Metadata: Document properties
	- 📑 Sections: Document structure
	- 🏷️ Entities: Named entities

	## 📋 Supported Formats

	- PDF Documents (*.pdf)
	- Word Documents (*.docx)
	- Text Files (*.txt)
	- HTML Files (*.html)
	- Markdown Files (*.md)

	## 🛠️ Technical Details

	Built with:
	- Docling: Advanced document processing
	- Gradio: Interactive web interface
	- Pydantic: Type-safe data handling
	- Hugging Face Spaces: Cloud deployment

	## 📝 License

	MIT License