|
--- |
|
library_name: transformers |
|
tags: |
|
- document-question-answering |
|
- layoutlmv3 |
|
- ocr |
|
- document-understanding |
|
- paddleocr |
|
- multilingual |
|
- layout-aware |
|
- lakshya-singh |
|
license: apache-2.0 |
|
language: |
|
- en |
|
base_model: |
|
- microsoft/layoutlmv3-base |
|
datasets: |
|
- nielsr/docvqa_1200_examples |
|
--- |
|
|
|
# Document QA Model |
|
|
|
This is a fine-tuned **document question-answering model** based on `layoutlmv3-base`. It is trained to understand documents using OCR data (via PaddleOCR) and accurately answer questions related to structured information in the document layout. |
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Model Name:** `document-qa-model` |
|
- **Base Model:** [`microsoft/layoutlmv3-base`](https://huggingface.co/microsoft/layoutlmv3-base) |
|
- **Fine-tuned by:** Lakshya Singh (solo contributor) |
|
- **Languages:** English, Spanish, French, German, Italian |
|
- **License:** Apache-2.0 (inherited from base model) |
|
- **Intended Use:** Extract answers to structured queries from scanned documents |
|
- **Not funded** — this project was completed independently. |
|
|
|
--- |
|
|
|
## Model Sources |
|
|
|
- **Repository:** [`Github Link`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model) |
|
- **Trained on:** Adapted version of [`nielsr/docvqa_1200_examples`](https://huggingface.co/datasets/nielsr/docvqa_1200_examples) |
|
- **Model metrics:** See  |
|
|
|
--- |
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model can be used for: |
|
- Question Answering on document images (PDFs, invoices, utility bills) |
|
- Information extraction tasks using OCR and layout-aware understanding |
|
|
|
### Out-of-Scope Use |
|
|
|
- Not suitable for conversational QA |
|
- Not suitable for images with no OCR-processed text |
|
|
|
--- |
|
|
|
## Training Details |
|
|
|
### Dataset |
|
|
|
The dataset consisted of: |
|
- **Images** of utility bills and documents |
|
- **OCR data** with bounding boxes (from PaddleOCR) |
|
- **Queries** in English, Spanish, and Chinese |
|
- **Answer spans** with match scores and positions |
|
|
|
### Training Procedure |
|
|
|
- Preprocessing: PaddleOCR was used to extract tokens, positions, and structure |
|
- Model: LayoutLMv3-base |
|
- Epochs: 4 |
|
- Learning rate schedule: Shown in image below |
|
|
|
### Training Metrics |
|
|
|
- **F1 Score** (validation):  |
|
- **Loss & Learning Rate Chart**:  |
|
|
|
--- |
|
|
|
## Evaluation |
|
|
|
### Metrics Used |
|
- F1 score |
|
- Match score of predicted spans |
|
- Token overlap vs ground truth |
|
|
|
### Summary |
|
|
|
The model performs well on document-style QA tasks, especially with: |
|
- Clearly structured OCR results |
|
- Document types similar to utility bills, invoices, and forms |
|
|
|
--- |
|
|
|
## How to Use |
|
|
|
- Available on my [`Github`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model) |
|
|