document-qa-model / README.md
lakshya-rawat's picture
Update README.md
da54f70 verified
---
library_name: transformers
tags:
- document-question-answering
- layoutlmv3
- ocr
- document-understanding
- paddleocr
- multilingual
- layout-aware
- lakshya-singh
license: apache-2.0
language:
- en
base_model:
- microsoft/layoutlmv3-base
datasets:
- nielsr/docvqa_1200_examples
---
# Document QA Model
This is a fine-tuned **document question-answering model** based on `layoutlmv3-base`. It is trained to understand documents using OCR data (via PaddleOCR) and accurately answer questions related to structured information in the document layout.
---
## Model Details
### Model Description
- **Model Name:** `document-qa-model`
- **Base Model:** [`microsoft/layoutlmv3-base`](https://huggingface.co/microsoft/layoutlmv3-base)
- **Fine-tuned by:** Lakshya Singh (solo contributor)
- **Languages:** English, Spanish, French, German, Italian
- **License:** Apache-2.0 (inherited from base model)
- **Intended Use:** Extract answers to structured queries from scanned documents
- **Not funded** — this project was completed independently.
---
## Model Sources
- **Repository:** [`Github Link`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model)
- **Trained on:** Adapted version of [`nielsr/docvqa_1200_examples`](https://huggingface.co/datasets/nielsr/docvqa_1200_examples)
- **Model metrics:** See ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)
---
## Uses
### Direct Use
This model can be used for:
- Question Answering on document images (PDFs, invoices, utility bills)
- Information extraction tasks using OCR and layout-aware understanding
### Out-of-Scope Use
- Not suitable for conversational QA
- Not suitable for images with no OCR-processed text
---
## Training Details
### Dataset
The dataset consisted of:
- **Images** of utility bills and documents
- **OCR data** with bounding boxes (from PaddleOCR)
- **Queries** in English, Spanish, and Chinese
- **Answer spans** with match scores and positions
### Training Procedure
- Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
- Model: LayoutLMv3-base
- Epochs: 4
- Learning rate schedule: Shown in image below
### Training Metrics
- **F1 Score** (validation): ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)
- **Loss & Learning Rate Chart**: ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)
---
## Evaluation
### Metrics Used
- F1 score
- Match score of predicted spans
- Token overlap vs ground truth
### Summary
The model performs well on document-style QA tasks, especially with:
- Clearly structured OCR results
- Document types similar to utility bills, invoices, and forms
---
## How to Use
- Available on my [`Github`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model)