TrOCR-LaTeX (fine-tuned on math handwriting)

Take your handwritten math and turn it into clean LaTeX code. This is a fine-tuned version of microsoft/trocr-base-handwritten, a transformer-based optical character recognition model, adapted to work with handwritten math images and structured math syntax.

Data

Fine-tuned on Google's MathWriting dataset. Contains over 500,000 digital inks of handwritten mathematical expressions obtained through either manual labelling or programmatic generation.

Intended use & limitations

You can use this model for OCR on a single math expression.

There is degraded performance on very long expressions (due to image preprocessing, 3:2 aspect ratio seems to work best).

  • Create an expression chunking scheme to split the image into subimages and process each to bypass this limitation.
  • In order to process multiple expressions, you need to chuck groups into single expressions.

How to use (PyTorch)

from transformers import TrOCRProcessor, VisionEncoderDecoderModel
from PIL import Image

# Helper funtion (path to either JPEG or PNG)
def open_PIL_image(image_path: str) -> Image.Image:
  image = Image.open(image_path)
  if image_path.split('.')[-1].lower() == 'png':
      image = Image.composite(image, PIL.Image.new('RGB', image.size, 'white'), image)
  return image


# Load model and processor from Hugging Face
processor = TrOCRProcessor.from_pretrained('tjoab/latex_finetuned')
model = VisionEncoderDecoderModel.from_pretrained('tjoab/latex_finetuned')


# Load all images as a batch
images = [open_PIL_image(path) for path in paths]

# Preprocess the images 
preproc_image = processor.image_processor(images=images, return_tensors="pt").pixel_values

# Generate and decode the tokens
# NOTE: max_length default value is very small, which often results in truncated inference if not set 
pred_ids = model.generate(preproc_image, max_length=128)
latex_preds = processor.batch_decode(pred_ids, skip_special_tokens=True)

Training Details

  • Mini-batch size: 8
  • Optimizer: Adam
  • LR Scheduler: cosine
  • fp16 mixed precision
    • Trained using automatic mixed precision (AMP) with torch.cuda.amp for reduced memory usage.
  • Gradient accumulation
    • Used to simulate a larger effective batch size while keeping per-step memory consumption low.
    • Optimizer steps occurred every 8 mini-batches.

Evaluation

Performance was evaluated using Character Error Rate (CER) defined as:

CER = (Substitutions + Insertions + Deletions) / Total Characters in Ground Truth

  • βœ… Why CER?

    • Math expressions are structurally sensitive. Shuffling even a single character can completely change the meaning.
      • x^2 vs. x_2
      • \frac{a}{b} vs. \frac{b}{a}
    • CER will penalizes small error in syntax.
  • Evalution yeilded a CER of 14.9%.

BibTeX and Citation

The original TrORC model was introduced in this paper:

TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models by Li et al.

You can find the source code in their repository.

@misc{li2021trocr,
      title={TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models}, 
      author={Minghao Li and Tengchao Lv and Lei Cui and Yijuan Lu and Dinei Florencio and Cha Zhang and Zhoujun Li and Furu Wei},
      year={2021},
      eprint={2109.10282},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
58
Safetensors
Model size
334M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for tjoab/latex_finetuned

Finetuned
(12)
this model

Space using tjoab/latex_finetuned 1