YOLO Document Layout Model
This model is a fine-tuned YOLO detector for document layout analysis, capable of identifying various document elements such as text columns, figures, tables, and other typographical features.
Interactive Demo
Try the model directly in your browser:
Model Description
The model is trained to detect and classify 20 different document components, including text structures (TextColumn, List), semantic elements (Title, Header), typographical features (Bold, Italic), and visual components (Figure, Table).
Model Detections
Training
The model was fine-tuned using a proprietary dataset of document images.
Evaluation Results
The model's performance was evaluated on a test set with the following metrics:
Class | Images | Instances | Precision | Recall | mAP50 | mAP50-95 |
---|---|---|---|---|---|---|
all | 150 | 1255 | 0.701 | 0.723 | 0.735 | 0.509 |
Author | 7 | 65 | 0.693 | 0.174 | 0.307 | 0.134 |
Bigletter | 11 | 11 | 1.000 | 0.900 | 0.976 | 0.563 |
Bleeding | 9 | 10 | 0.618 | 0.700 | 0.667 | 0.547 |
Bold | 23 | 77 | 0.679 | 0.753 | 0.798 | 0.395 |
Caption | 50 | 71 | 0.892 | 0.816 | 0.881 | 0.642 |
Date | 17 | 57 | 0.927 | 0.666 | 0.728 | 0.386 |
Figure | 90 | 149 | 0.772 | 0.725 | 0.823 | 0.677 |
Footnote | 14 | 15 | 0.500 | 0.667 | 0.612 | 0.478 |
Header | 16 | 16 | 0.560 | 0.717 | 0.664 | 0.476 |
Italic | 17 | 86 | 0.448 | 0.791 | 0.557 | 0.327 |
List | 34 | 55 | 0.615 | 0.709 | 0.742 | 0.591 |
Map | 4 | 4 | 0.606 | 0.750 | 0.656 | 0.599 |
SubSubTitle | 37 | 97 | 0.627 | 0.520 | 0.599 | 0.300 |
SubTitle | 54 | 96 | 0.605 | 0.562 | 0.605 | 0.327 |
Table | 30 | 43 | 0.865 | 0.953 | 0.966 | 0.855 |
TextColumn | 115 | 323 | 0.831 | 0.913 | 0.933 | 0.811 |
Title | 47 | 66 | 0.712 | 0.711 | 0.649 | 0.441 |
Underline | 2 | 4 | 0.681 | 1.000 | 0.995 | 0.665 |
equations | 4 | 10 | 0.688 | 0.700 | 0.809 | 0.450 |
Key Performance Highlights:
- Best performing classes: Table (mAP50: 0.966), TextColumn (mAP50: 0.933), and Caption (mAP50: 0.881)
- High precision classes: Bigletter (1.000), Date (0.927), and Caption (0.892)
- High recall classes: Underline (1.000), Table (0.953), and TextColumn (0.913)
- Overall performance: mAP50 of 0.735 and mAP50-95 of 0.509 across all classes
Limitations
- Lower performance on Author detection (mAP50: 0.307)
- Moderate performance on typographical features like Italic (mAP50: 0.557)
- Limited sample size for some classes (Map, Underline, equations)
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for ashen007/document-structure-detection
Base model
Ultralytics/YOLOv8