File size: 2,736 Bytes
e6bfdf7 54e7147 e6bfdf7 2fdffa3 e6bfdf7 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
---
license: apache-2.0
---
THIS IS WORK IN PROGRESS
# Docling Layout Model
`docling-layout-heron` is the Layout Model of [Docling project](https://github.com/docling-project/docling).
This model uses the [RT-DETRv2](https://github.com/lyuwenyu/RT-DETR/tree/main/rtdetrv2_pytorch) architecture and has been trained from scratch on a variety of document datasets.
# Inference code example
Prerequisites:
```bash
pip install transformers Pillow torch requests
```
Prediction:
```python
import requests
from transformers import RTDetrV2ForObjectDetection, RTDetrImageProcessor
import torch
from PIL import Image
classes_map = {
0: "Caption",
1: "Footnote",
2: "Formula",
3: "List-item",
4: "Page-footer",
5: "Page-header",
6: "Picture",
7: "Section-header",
8: "Table",
9: "Text",
10: "Title",
11: "Document Index",
12: "Code",
13: "Checkbox-Selected",
14: "Checkbox-Unselected",
15: "Form",
16: "Key-Value Region",
}
image_url = "https://huggingface.co/spaces/ds4sd/SmolDocling-256M-Demo/resolve/main/example_images/annual_rep_14.png"
model_name = "ds4sd/docling-layout-heron"
threshold = 0.6
# Download the image
image = Image.open(requests.get(image_url, stream=True).raw)
image = image.convert("RGB")
# Initialize the model
image_processor = RTDetrImageProcessor.from_pretrained(model_name)
model = RTDetrV2ForObjectDetection.from_pretrained(model_name)
# Run the prediction pipeline
inputs = image_processor(images=[image], return_tensors="pt")
with torch.no_grad():
outputs = model(**inputs)
results = image_processor.post_process_object_detection(
outputs,
target_sizes=torch.tensor([image.size[::-1]]),
threshold=threshold,
)
# Get the results
for result in results:
for score, label_id, box in zip(
result["scores"], result["labels"], result["boxes"]
):
score = round(score.item(), 2)
label = classes_map[label_id.item()]
box = [round(i, 2) for i in box.tolist()]
print(f"{label}:{score} {box}")
```
# References
```
@techreport{Docling,
author = {Deep Search Team},
month = {8},
title = {Docling Technical Report},
url = {https://arxiv.org/abs/2408.09869v4},
eprint = {2408.09869},
doi = {10.48550/arXiv.2408.09869},
version = {1.0.0},
year = {2024}
}
@misc{lv2024rtdetrv2improvedbaselinebagoffreebies,
title={RT-DETRv2: Improved Baseline with Bag-of-Freebies for Real-Time Detection Transformer},
author={Wenyu Lv and Yian Zhao and Qinyao Chang and Kui Huang and Guanzhong Wang and Yi Liu},
year={2024},
eprint={2407.17140},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2407.17140},
}
``` |