Tzktz's picture
Upload 7664 files
6fc683c verified

A newer version of the Gradio SDK is available: 5.29.1

Upgrade

DiT for Text Detection

Model outputs with FUNSD

Fine-tuned models on FUNSD

We summarize the validation results as follows. We also provide the fine-tuned weights.

name initialized checkpoint detection algorithm F1 weight
DiT-base-syn dit_base_patch16_224_syn Mask R-CNN 94.25 link
DiT-large-syn dit_large_patch16_224_syn Mask R-CNN 94.29 link

Usage

Data Preparation

Follow these steps to download and process the FUNSD. The resulting directory structure looks like the following:

│── data
β”‚   β”œβ”€β”€ annotations
β”‚   β”œβ”€β”€ imgs
β”‚   β”œβ”€β”€ instances_test.json
β”‚   └── instances_training.json

Training

The following command provide example to train the Mask R-CNN with DiT backbone on 8 32GB Nvidia V100 GPUs.

The config files can be found in configs.

python train_net.py --config-file configs/mask_rcnn_dit_base.yaml --num-gpus 8 --resume MODEL.WEIGHTS path/to/model OUTPUT_DIR path/to/output

Evaluation

The following commands provide examples to evaluate the fine-tuned checkpoint of DiT-Base with Mask R-CNN.

python train_net.py --config-file configs/mask_rcnn_dit_base.yaml --eval-only --num-gpus 8  --resume  MODEL.WEIGHTS path/to/model OUTPUT_DIR path/to/output

Citation

If you find this repository useful, please consider citing our work:

@misc{li2022dit,
    title={DiT: Self-supervised Pre-training for Document Image Transformer},
    author={Junlong Li and Yiheng Xu and Tengchao Lv and Lei Cui and Cha Zhang and Furu Wei},
    year={2022},
    eprint={2203.02378},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}

Acknowledgment

Thanks to Detectron2 for Mask R-CNN implementation and MMOCR for the data preprocessing implementation of the FUNSD