menu-text-detection / README.md
github-actions[bot]
Sync from https://github.com/ryanlinjui/menu-text-detection
8af6af2

A newer version of the Gradio SDK is available: 5.29.0

Upgrade
metadata
title: menu text detection
emoji: πŸ¦„
colorFrom: indigo
colorTo: pink
sdk: gradio
python_version: 3.11
short_description: Extract structured menu information from images into JSON...
tags:
  - donut
  - fine-tuning
  - image-to-text
  - transformer

Menu Text Detection System

Extract structured menu information from images into JSON using a fine-tuned Donut E2E model.

Based on Donut by Clova AI (ECCV ’22)

demo

Gradio Space Demo
Hugging Face Models & Datasets

πŸš€ Features

Overview

Currently supports the following information from menu images:

  • Restaurant Name
  • Business Hours
  • Address
  • Phone Number
  • Dish Information
    • Name
    • Price

For the JSON schema, see tools directory.

Supported Methods to Extract Menu Information

  • Fine-tuned Donut model
  • OpenAI GPT API
  • Google Gemini API

πŸ’» Training / Fine-Tuning

Setup

Use uv to set up the development environment:

uv sync

Training Script (Datasets collecting, Fine-Tuning)

Please refer train.ipynb. Use Jupyter Notebook for training:

uv run jupyter-notebook

For VSCode users, please install Jupyter extension, then select .venv/bin/python as your kernel.

Run Demo Locally

uv run python app.py