metadata

title: menu text detection
emoji: 🦄
colorFrom: indigo
colorTo: pink
sdk: gradio
python_version: 3.11
short_description: Extract structured menu information from images into JSON...
tags:
  - donut
  - fine-tuning
  - image-to-text
  - transformer

Menu Text Detection System

Extract structured menu information from images into JSON using a fine-tuned Donut E2E model.

Based on Donut by Clova AI (ECCV ’22)

🚀 Features

Overview

Currently supports the following information from menu images:

Restaurant Name
Business Hours
Address
Phone Number
Dish Information
- Name
- Price

For the JSON schema, see tools directory.

Supported Methods to Extract Menu Information

Fine-tuned Donut model
OpenAI GPT API
Google Gemini API

💻 Training / Fine-Tuning

Setup

Use uv to set up the development environment:

uv sync

Training Script (Datasets collecting, Fine-Tuning)

Please refer train.ipynb. Use Jupyter Notebook for training:

uv run jupyter-notebook

For VSCode users, please install Jupyter extension, then select .venv/bin/python as your kernel.

Run Demo Locally

uv run python app.py