docling_free / README.md
hellorahulk's picture
roolback
97c779b

A newer version of the Gradio SDK is available: 5.29.0

Upgrade
metadata
title: Smart Document Parser
emoji: πŸ’»
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.13.0
app_file: app.py
pinned: false

πŸ“„ Smart Document Parser

A powerful document parsing application that automatically extracts structured information from various document formats.

πŸš€ Features

  • Multiple Format Support: PDF, DOCX, TXT, HTML, and Markdown
  • Rich Information Extraction:
    • Document content with preserved formatting
    • Comprehensive metadata
    • Section breakdown
    • Named entity recognition
  • Smart Processing:
    • Automatic format detection
    • Confidence scoring
    • Error handling

🎯 How to Use

  1. Upload Document: Click the upload button or drag & drop your document
  2. Process: Click "Process Document"
  3. View Results: Explore the extracted information in different tabs:
    • πŸ“ Content: Main document text
    • πŸ“Š Metadata: Document properties
    • πŸ“‘ Sections: Document structure
    • 🏷️ Entities: Named entities

πŸ“‹ Supported Formats

  • PDF Documents (*.pdf)
  • Word Documents (*.docx)
  • Text Files (*.txt)
  • HTML Files (*.html)
  • Markdown Files (*.md)

πŸ› οΈ Technical Details

Built with:

  • Docling: Advanced document processing
  • Gradio: Interactive web interface
  • Pydantic: Type-safe data handling
  • Hugging Face Spaces: Cloud deployment

πŸ“ License

MIT License