metadata

license: apache-2.0
language:
  - en
metrics:
  - precision
  - recall
base_model:
  - Ultralytics/YOLOv8
pipeline_tag: object-detection

YOLO Document Layout Model

This model is a fine-tuned YOLO detector for document layout analysis, capable of identifying various document elements such as text columns, figures, tables, and other typographical features.

Interactive Demo

Try the model directly in your browser:

Model Description

The model is trained to detect and classify 20 different document components, including text structures (TextColumn, List), semantic elements (Title, Header), typographical features (Bold, Italic), and visual components (Figure, Table).

Model Detections

Training

The model was fine-tuned using a proprietary dataset of document images.

Evaluation Results

The model's performance was evaluated on a test set with the following metrics:

Class	Images	Instances	Precision	Recall	mAP50	mAP50-95
all	150	1255	0.701	0.723	0.735	0.509
Author	7	65	0.693	0.174	0.307	0.134
Bigletter	11	11	1.000	0.900	0.976	0.563
Bleeding	9	10	0.618	0.700	0.667	0.547
Bold	23	77	0.679	0.753	0.798	0.395
Caption	50	71	0.892	0.816	0.881	0.642
Date	17	57	0.927	0.666	0.728	0.386
Figure	90	149	0.772	0.725	0.823	0.677
Footnote	14	15	0.500	0.667	0.612	0.478
Header	16	16	0.560	0.717	0.664	0.476
Italic	17	86	0.448	0.791	0.557	0.327
List	34	55	0.615	0.709	0.742	0.591
Map	4	4	0.606	0.750	0.656	0.599
SubSubTitle	37	97	0.627	0.520	0.599	0.300
SubTitle	54	96	0.605	0.562	0.605	0.327
Table	30	43	0.865	0.953	0.966	0.855
TextColumn	115	323	0.831	0.913	0.933	0.811
Title	47	66	0.712	0.711	0.649	0.441
Underline	2	4	0.681	1.000	0.995	0.665
equations	4	10	0.688	0.700	0.809	0.450

Key Performance Highlights:

Best performing classes: Table (mAP50: 0.966), TextColumn (mAP50: 0.933), and Caption (mAP50: 0.881)
High precision classes: Bigletter (1.000), Date (0.927), and Caption (0.892)
High recall classes: Underline (1.000), Table (0.953), and TextColumn (0.913)
Overall performance: mAP50 of 0.735 and mAP50-95 of 0.509 across all classes

Limitations

Lower performance on Author detection (mAP50: 0.307)
Moderate performance on typographical features like Italic (mAP50: 0.557)
Limited sample size for some classes (Map, Underline, equations)