lakshya-rawat
/

document-qa-model

Document Question Answering

document-understanding

Model card Files Files and versions Community

document-qa-model / README.md

lakshya-rawat's picture

Update README.md

da54f70 verified 4 days ago

|

history blame contribute delete

3.01 kB

	---
	library_name: transformers
	tags:
	- document-question-answering
	- layoutlmv3
	- ocr
	- document-understanding
	- paddleocr
	- multilingual
	- layout-aware
	- lakshya-singh
	license: apache-2.0
	language:
	- en
	base_model:
	- microsoft/layoutlmv3-base
	datasets:
	- nielsr/docvqa_1200_examples
	---

	# Document QA Model

	This is a fine-tuned document question-answering model based on `layoutlmv3-base`. It is trained to understand documents using OCR data (via PaddleOCR) and accurately answer questions related to structured information in the document layout.

	---

	## Model Details

	### Model Description

	- Model Name: `document-qa-model`
	- Base Model: [`microsoft/layoutlmv3-base`](https://huggingface.co/microsoft/layoutlmv3-base)
	- Fine-tuned by: Lakshya Singh (solo contributor)
	- Languages: English, Spanish, French, German, Italian
	- License: Apache-2.0 (inherited from base model)
	- Intended Use: Extract answers to structured queries from scanned documents
	- Not funded — this project was completed independently.

	---

	## Model Sources

	- Repository: [`Github Link`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model)
	- Trained on: Adapted version of [`nielsr/docvqa_1200_examples`](https://huggingface.co/datasets/nielsr/docvqa_1200_examples)
	- Model metrics: See ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)

	---

	## Uses

	### Direct Use

	This model can be used for:
	- Question Answering on document images (PDFs, invoices, utility bills)
	- Information extraction tasks using OCR and layout-aware understanding

	### Out-of-Scope Use

	- Not suitable for conversational QA
	- Not suitable for images with no OCR-processed text

	---

	## Training Details

	### Dataset

	The dataset consisted of:
	- Images of utility bills and documents
	- OCR data with bounding boxes (from PaddleOCR)
	- Queries in English, Spanish, and Chinese
	- Answer spans with match scores and positions

	### Training Procedure

	- Preprocessing: PaddleOCR was used to extract tokens, positions, and structure
	- Model: LayoutLMv3-base
	- Epochs: 4
	- Learning rate schedule: Shown in image below

	### Training Metrics

	- F1 Score (validation): ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)
	- Loss & Learning Rate Chart: ![training_history.png](https://cdn-uploads.huggingface.co/production/uploads/66a7331438fbd9075584523f/MtMe5CZy3wb2nEG1wTRMc.png)

	---

	## Evaluation

	### Metrics Used
	- F1 score
	- Match score of predicted spans
	- Token overlap vs ground truth

	### Summary

	The model performs well on document-style QA tasks, especially with:
	- Clearly structured OCR results
	- Document types similar to utility bills, invoices, and forms

	---

	## How to Use

	- Available on my [`Github`](https://github.com/Lakshyasinghrawat12/DocumentQA-lakshya-rawat-document-qa-model)