TrOCR-LaTeX (fine-tuned on math handwriting)
Take your handwritten math and turn it into clean LaTeX code.
This is a fine-tuned version of microsoft/trocr-base-handwritten
,
a transformer-based optical character recognition model, adapted to work with handwritten math images and structured math syntax.
Data
Fine-tuned on Google's MathWriting
dataset. Contains over 500,000 digital inks of handwritten mathematical expressions obtained through either manual labelling or programmatic generation.
Intended use & limitations
You can use this model for OCR on a single math expression.
There is degraded performance on very long expressions (due to image preprocessing, 3:2 aspect ratio seems to work best).
- Create an expression chunking scheme to split the image into subimages and process each to bypass this limitation.
- In order to process multiple expressions, you need to chuck groups into single expressions.
How to use (PyTorch)
from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image
# Helper funtion (path to either JPEG or PNG)
def open_PIL_image(image_path: str) -> Image.Image:
image = Image.open(image_path)
if image_path.split('.')[-1].lower() == 'png':
image = Image.composite(image, PIL.Image.new('RGB', image.size, 'white'), image)
return image
# Load model and processor from Hugging Face
processor = TrOCRProcessor.from_pretrained('tjoab/latex_finetuned')
model = VisionEncoderDecoderModel.from_pretrained('tjoab/latex_finetuned')
# Load all images as a batch
images = [open_PIL_image(path) for path in paths]
# Preprocess the images
preproc_image = processor.image_processor(images=images, return_tensors="pt").pixel_values
# Generate and decode the tokens
# NOTE: max_length default value is very small, which often results in truncated inference if not set
pred_ids = model.generate(preproc_image, max_length=128)
latex_preds = processor.batch_decode(pred_ids, skip_special_tokens=True)
Training Details
- Mini-batch size: 8
- Optimizer: Adam
- LR Scheduler: cosine
fp16
mixed precision- Trained using automatic mixed precision (AMP) with
torch.cuda.amp
for reduced memory usage.
- Trained using automatic mixed precision (AMP) with
- Gradient accumulation
- Used to simulate a larger effective batch size while keeping per-step memory consumption low.
- Optimizer steps occurred every 8 mini-batches.
Evaluation
Performance was evaluated using Character Error Rate (CER) defined as:
CER = (Substitutions + Insertions + Deletions) / Total Characters in Ground Truth
β Why CER?
- Math expressions are structurally sensitive. Shuffling even a single character can completely change the meaning.
x^2
vs.x_2
\frac{a}{b}
vs.\frac{b}{a}
- CER will penalizes small error in syntax.
- Math expressions are structurally sensitive. Shuffling even a single character can completely change the meaning.
Evalution yeilded a CER of 14.9%.
BibTeX and Citation
The original TrORC model was introduced in this paper:
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al.
You can find the source code in their repository.
@misc{li2021trocr,
title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models},
author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
year={2021},
eprint={2109.10282},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
- Downloads last month
- 58
Model tree for tjoab/latex_finetuned
Base model
microsoft/trocr-base-handwritten