metadata
license: apache-2.0
language:
- en
metrics:
- precision
- recall
base_model:
- Ultralytics/YOLOv8
pipeline_tag: object-detection
YOLO Document Layout Model
This model is a fine-tuned YOLO detector for document layout analysis, capable of identifying various document elements such as text columns, figures, tables, and other typographical features.
Interactive Demo
Try the model directly in your browser:
Model Description
The model is trained to detect and classify 20 different document components, including text structures (TextColumn, List), semantic elements (Title, Header), typographical features (Bold, Italic), and visual components (Figure, Table).
Model Detections
Training
The model was fine-tuned using a proprietary dataset of document images.
Evaluation Results
The model's performance was evaluated on a test set with the following metrics:
Class | Images | Instances | Precision | Recall | mAP50 | mAP50-95 |
---|---|---|---|---|---|---|
all | 150 | 1255 | 0.701 | 0.723 | 0.735 | 0.509 |
Author | 7 | 65 | 0.693 | 0.174 | 0.307 | 0.134 |
Bigletter | 11 | 11 | 1.000 | 0.900 | 0.976 | 0.563 |
Bleeding | 9 | 10 | 0.618 | 0.700 | 0.667 | 0.547 |
Bold | 23 | 77 | 0.679 | 0.753 | 0.798 | 0.395 |
Caption | 50 | 71 | 0.892 | 0.816 | 0.881 | 0.642 |
Date | 17 | 57 | 0.927 | 0.666 | 0.728 | 0.386 |
Figure | 90 | 149 | 0.772 | 0.725 | 0.823 | 0.677 |
Footnote | 14 | 15 | 0.500 | 0.667 | 0.612 | 0.478 |
Header | 16 | 16 | 0.560 | 0.717 | 0.664 | 0.476 |
Italic | 17 | 86 | 0.448 | 0.791 | 0.557 | 0.327 |
List | 34 | 55 | 0.615 | 0.709 | 0.742 | 0.591 |
Map | 4 | 4 | 0.606 | 0.750 | 0.656 | 0.599 |
SubSubTitle | 37 | 97 | 0.627 | 0.520 | 0.599 | 0.300 |
SubTitle | 54 | 96 | 0.605 | 0.562 | 0.605 | 0.327 |
Table | 30 | 43 | 0.865 | 0.953 | 0.966 | 0.855 |
TextColumn | 115 | 323 | 0.831 | 0.913 | 0.933 | 0.811 |
Title | 47 | 66 | 0.712 | 0.711 | 0.649 | 0.441 |
Underline | 2 | 4 | 0.681 | 1.000 | 0.995 | 0.665 |
equations | 4 | 10 | 0.688 | 0.700 | 0.809 | 0.450 |
Key Performance Highlights:
- Best performing classes: Table (mAP50: 0.966), TextColumn (mAP50: 0.933), and Caption (mAP50: 0.881)
- High precision classes: Bigletter (1.000), Date (0.927), and Caption (0.892)
- High recall classes: Underline (1.000), Table (0.953), and TextColumn (0.913)
- Overall performance: mAP50 of 0.735 and mAP50-95 of 0.509 across all classes
Limitations
- Lower performance on Author detection (mAP50: 0.307)
- Moderate performance on typographical features like Italic (mAP50: 0.557)
- Limited sample size for some classes (Map, Underline, equations)