Spaces:
Build error
Build error
File size: 8,299 Bytes
230c9a6 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 |
.. _algorithm_layout_detection:
=================
Layout Detection Algorithm
=================
Introduction
=================
Layout detection is a fundamental task in document content extraction, aiming to locate different types of regions on a page, such as images, tables, text, and headings, to facilitate high-quality content extraction. For text and heading regions, OCR models can be used for text recognition, while table regions can be converted using table recognition models.
Model Usage
=================
Layout detection supports following models:
.. raw:: html
<style type="text/css">
.tg {border-collapse:collapse;border-color:#9ABAD9;border-spacing:0;}
.tg td{background-color:#EBF5FF;border-color:#9ABAD9;border-style:solid;border-width:1px;color:#444;
font-family:Arial, sans-serif;font-size:14px;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg th{background-color:#409cff;border-color:#9ABAD9;border-style:solid;border-width:1px;color:#fff;
font-family:Arial, sans-serif;font-size:14px;font-weight:normal;overflow:hidden;padding:10px 5px;word-break:normal;}
.tg .tg-f8tz{background-color:#409cff;border-color:inherit;text-align:left;vertical-align:top}
.tg .tg-0lax{text-align:left;vertical-align:top}
.tg .tg-0pky{border-color:inherit;text-align:left;vertical-align:top}
</style>
<table class="tg"><thead>
<tr>
<th class="tg-0lax">Model</th>
<th class="tg-f8tz">Description</th>
<th class="tg-f8tz">Characteristics</th>
<th class="tg-f8tz">Model weight</th>
<th class="tg-f8tz">Config file</th>
</tr></thead>
<tbody>
<tr>
<td class="tg-0lax">DocLayout-YOLO</td>
<td class="tg-0pky">Improved based on YOLO-v10:<br>1. Generate diverse pre-training data,enhance generalization ability across multiple document types<br>2. Model architecture improvement, improve perception ability on scale-varing instances<br>Details in <a href="https://github.com/opendatalab/DocLayout-YOLO" target="_blank" rel="noopener noreferrer">DocLayout-YOLO</a></td>
<td class="tg-0pky">Speed:Fast, Accuracy:High</td>
<td class="tg-0pky"><a href="https://huggingface.co/opendatalab/PDF-Extract-Kit-1.0/blob/main/models/Layout/YOLO/doclayout_yolo_ft.pt" target="_blank" rel="noopener noreferrer">doclayout_yolo_ft.pt</a></td>
<td class="tg-0pky">layout_detection.yaml</td>
</tr>
<tr>
<td class="tg-0lax">YOLO-v10</td>
<td class="tg-0pky">Base YOLO-v10 model</td>
<td class="tg-0pky">Speed:Fast, Accuracy:Moderate</td>
<td class="tg-0pky"><a href="https://huggingface.co/opendatalab/PDF-Extract-Kit-1.0/blob/main/models/Layout/YOLO/yolov10l_ft.pt" target="_blank" rel="noopener noreferrer">yolov10l_ft.pt</a></td>
<td class="tg-0pky">layout_detection_yolo.yaml</td>
</tr>
<tr>
<td class="tg-0lax">LayoutLMv3</td>
<td class="tg-0pky">Base LayoutLMv3 model</td>
<td class="tg-0pky">Speed:Slow, Accuracy:High</td>
<td class="tg-0pky"><a href="https://huggingface.co/opendatalab/PDF-Extract-Kit-1.0/tree/main/models/Layout/LayoutLMv3" target="_blank" rel="noopener noreferrer">layoutlmv3_ft</a></td>
<td class="tg-0pky">layout_detection_layoutlmv3.yaml</td>
</tr>
</tbody></table>
Once enciroment is setup, you can perform layout detection by executing ``scripts/layout_detection.py`` directly.
**Run demo**
.. code:: shell
$ python scripts/layout_detection.py --config configs/layout_detection.yaml
Model Configuration
-----------------
**1. DocLayout-YOLO / YOLO-v10**
.. code:: yaml
inputs: assets/demo/layout_detection
outputs: outputs/layout_detection
tasks:
layout_detection:
model: layout_detection_yolo
model_config:
img_size: 1024
conf_thres: 0.25
iou_thres: 0.45
model_path: path/to/doclayout_yolo_model
visualize: True
- inputs/outputs: Define the input file path and the directory for visualization output.
- tasks: Define the task type, currently only a layout detection task is included.
- model: Specify the specific model type, e.g., layout_detection_yolo.
- model_config: Define the model configuration.
- img_size: Define the image long edge size; the short edge will be scaled proportionally based on the long edge, with the default long edge being 1024.
- conf_thres: Define the confidence threshold, detecting only targets above this threshold.
- iou_thres: Define the IoU threshold, removing targets with an overlap greater than this threshold.
- model_path: Path to the model weights.
- visualize: Whether to visualize the model results; visualized results will be saved in the outputs directory.
**2. layoutlmv3**
.. note::
LayoutLMv3 cannot run directly by default. Please follow the steps below to modify the configuration:
1. **Detectron2 Environment Setup**
.. code-block:: bash
# For Linux
pip install https://wheels-1251341229.cos.ap-shanghai.myqcloud.com/assets/whl/detectron2/detectron2-0.6-cp310-cp310-linux_x86_64.whl
# For macOS
pip install https://wheels-1251341229.cos.ap-shanghai.myqcloud.com/assets/whl/detectron2/detectron2-0.6-cp310-cp310-macosx_10_9_universal2.whl
# For Windows
pip install https://wheels-1251341229.cos.ap-shanghai.myqcloud.com/assets/whl/detectron2/detectron2-0.6-cp310-cp310-win_amd64.whl
2. **Enable LayoutLMv3 Registration Code**
Uncomment the lines at the following links:
- `line 2 <https://github.com/opendatalab/PDF-Extract-Kit/blob/main/pdf_extract_kit/tasks/layout_detection/__init__.py#L2>`_
- `line 8 <https://github.com/opendatalab/PDF-Extract-Kit/blob/main/pdf_extract_kit/tasks/layout_detection/__init__.py#L8>`_
.. code-block:: python
from pdf_extract_kit.tasks.layout_detection.models.yolo import LayoutDetectionYOLO
from pdf_extract_kit.tasks.layout_detection.models.layoutlmv3 import LayoutDetectionLayoutlmv3
from pdf_extract_kit.registry.registry import MODEL_REGISTRY
__all__ = [
"LayoutDetectionYOLO",
"LayoutDetectionLayoutlmv3",
]
.. code:: yaml
inputs: assets/demo/layout_detection
outputs: outputs/layout_detection
tasks:
layout_detection:
model: layout_detection_layoutlmv3
model_config:
model_path: path/to/layoutlmv3_model
- inputs/outputs: Define the input file path and the directory for visualization output.
- tasks: Define the task type, currently only a layout detection task is included.
- model: Specify the specific model type, e.g., layout_detection_layoutlmv3.
- model_config: Define the model configuration.
- model_path: Path to the model weights.
Diverse Input Support
-----------------
The layout detection script in PDF-Extract-Kit supports input formats such as a ``single image``, a ``directory containing only image files``, a ``single PDF file``, and a ``directory containing only PDF files``.
.. note::
Modify the path to inputs in configs/layout_detection.yaml according to your actual data format:
- Single image: path/to/image
- Image directory: path/to/images
- Single PDF file: path/to/pdf
- PDF directory: path/to/pdfs
.. note::
When using PDF as input, you need to change ``predict_images`` to ``predict_pdfs`` in ``layout_detection.py``.
.. code:: python
# for image detection
detection_results = model_layout_detection.predict_images(input_data, result_path)
Change to:
.. code:: python
# for pdf detection
detection_results = model_layout_detection.predict_pdfs(input_data, result_path)
Viewing Visualization Results
-----------------
When ``visualize`` is set to ``True`` in the config file, the visualization results will be saved in the ``outputs`` directory.
.. note::
Visualization is helpful for analyzing model results, but for large-scale tasks, it is recommended to turn off visualization (set ``visualize`` to ``False`` ) to reduce memory and disk usage. |