YOLO Document Layout Model

This model is a fine-tuned YOLO detector for document layout analysis, capable of identifying various document elements such as text columns, figures, tables, and other typographical features.

Interactive Demo

Try the model directly in your browser:

Hugging Face Spaces

Model Description

The model is trained to detect and classify 20 different document components, including text structures (TextColumn, List), semantic elements (Title, Header), typographical features (Bold, Italic), and visual components (Figure, Table).

Model Detections

image/png

image/png

Training

The model was fine-tuned using a proprietary dataset of document images.

Evaluation Results

The model's performance was evaluated on a test set with the following metrics:

Class Images Instances Precision Recall mAP50 mAP50-95
all 150 1255 0.701 0.723 0.735 0.509
Author 7 65 0.693 0.174 0.307 0.134
Bigletter 11 11 1.000 0.900 0.976 0.563
Bleeding 9 10 0.618 0.700 0.667 0.547
Bold 23 77 0.679 0.753 0.798 0.395
Caption 50 71 0.892 0.816 0.881 0.642
Date 17 57 0.927 0.666 0.728 0.386
Figure 90 149 0.772 0.725 0.823 0.677
Footnote 14 15 0.500 0.667 0.612 0.478
Header 16 16 0.560 0.717 0.664 0.476
Italic 17 86 0.448 0.791 0.557 0.327
List 34 55 0.615 0.709 0.742 0.591
Map 4 4 0.606 0.750 0.656 0.599
SubSubTitle 37 97 0.627 0.520 0.599 0.300
SubTitle 54 96 0.605 0.562 0.605 0.327
Table 30 43 0.865 0.953 0.966 0.855
TextColumn 115 323 0.831 0.913 0.933 0.811
Title 47 66 0.712 0.711 0.649 0.441
Underline 2 4 0.681 1.000 0.995 0.665
equations 4 10 0.688 0.700 0.809 0.450

Key Performance Highlights:

  • Best performing classes: Table (mAP50: 0.966), TextColumn (mAP50: 0.933), and Caption (mAP50: 0.881)
  • High precision classes: Bigletter (1.000), Date (0.927), and Caption (0.892)
  • High recall classes: Underline (1.000), Table (0.953), and TextColumn (0.913)
  • Overall performance: mAP50 of 0.735 and mAP50-95 of 0.509 across all classes

Limitations

  • Lower performance on Author detection (mAP50: 0.307)
  • Moderate performance on typographical features like Italic (mAP50: 0.557)
  • Limited sample size for some classes (Map, Underline, equations)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for ashen007/document-structure-detection

Base model

Ultralytics/YOLOv8
Finetuned
(65)
this model

Space using ashen007/document-structure-detection 1