File size: 3,149 Bytes
3195dd3
 
 
 
 
 
 
 
 
 
5d1fba3
 
 
 
 
 
1ef1827
 
 
 
 
 
5d1fba3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
---
license: apache-2.0
language:
- en
metrics:
- precision
- recall
base_model:
- Ultralytics/YOLOv8
pipeline_tag: object-detection
---

# YOLO Document Layout Model

This model is a fine-tuned YOLO detector for document layout analysis, capable of identifying various document elements such as text columns, figures, tables, and other typographical features.

## Interactive Demo

Try the model directly in your browser:

[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://ashen007-yolo-document-layout-demo.hf.space/?__theme=system)

## Model Description

The model is trained to detect and classify 20 different document components, including text structures (TextColumn, List), semantic elements (Title, Header), typographical features (Bold, Italic), and visual components (Figure, Table).

## Model Detections

![image/png](https://cdn-uploads.huggingface.co/production/uploads/661a149cd7c07238c2b3ddc2/RUOv2iWaY1sJCQcvQl2Ik.png)

![image/png](https://cdn-uploads.huggingface.co/production/uploads/661a149cd7c07238c2b3ddc2/07I3lmV59UljZfo7ItlWW.png)

### Training

The model was fine-tuned using a proprietary dataset of document images.

## Evaluation Results

The model's performance was evaluated on a test set with the following metrics:

| Class | Images | Instances | Precision | Recall | mAP50 | mAP50-95 |
|-------|--------|-----------|-----------|--------|-------|----------|
| **all** | **150** | **1255** | **0.701** | **0.723** | **0.735** | **0.509** |
| Author | 7 | 65 | 0.693 | 0.174 | 0.307 | 0.134 |
| Bigletter | 11 | 11 | 1.000 | 0.900 | 0.976 | 0.563 |
| Bleeding | 9 | 10 | 0.618 | 0.700 | 0.667 | 0.547 |
| Bold | 23 | 77 | 0.679 | 0.753 | 0.798 | 0.395 |
| Caption | 50 | 71 | 0.892 | 0.816 | 0.881 | 0.642 |
| Date | 17 | 57 | 0.927 | 0.666 | 0.728 | 0.386 |
| Figure | 90 | 149 | 0.772 | 0.725 | 0.823 | 0.677 |
| Footnote | 14 | 15 | 0.500 | 0.667 | 0.612 | 0.478 |
| Header | 16 | 16 | 0.560 | 0.717 | 0.664 | 0.476 |
| Italic | 17 | 86 | 0.448 | 0.791 | 0.557 | 0.327 |
| List | 34 | 55 | 0.615 | 0.709 | 0.742 | 0.591 |
| Map | 4 | 4 | 0.606 | 0.750 | 0.656 | 0.599 |
| SubSubTitle | 37 | 97 | 0.627 | 0.520 | 0.599 | 0.300 |
| SubTitle | 54 | 96 | 0.605 | 0.562 | 0.605 | 0.327 |
| Table | 30 | 43 | 0.865 | 0.953 | 0.966 | 0.855 |
| TextColumn | 115 | 323 | 0.831 | 0.913 | 0.933 | 0.811 |
| Title | 47 | 66 | 0.712 | 0.711 | 0.649 | 0.441 |
| Underline | 2 | 4 | 0.681 | 1.000 | 0.995 | 0.665 |
| equations | 4 | 10 | 0.688 | 0.700 | 0.809 | 0.450 |

### Key Performance Highlights:

- **Best performing classes**: Table (mAP50: 0.966), TextColumn (mAP50: 0.933), and Caption (mAP50: 0.881)
- **High precision classes**: Bigletter (1.000), Date (0.927), and Caption (0.892)
- **High recall classes**: Underline (1.000), Table (0.953), and TextColumn (0.913)
- **Overall performance**: mAP50 of 0.735 and mAP50-95 of 0.509 across all classes

## Limitations

- Lower performance on Author detection (mAP50: 0.307)
- Moderate performance on typographical features like Italic (mAP50: 0.557)
- Limited sample size for some classes (Map, Underline, equations)