metadata

title: MTEB Human Evaluation Demo
emoji: 📊
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.23.3
app_file: app.py
pinned: false

MTEB Human Evaluation Demo

This is a demo of the human evaluation interface for the MTEB (Massive Text Embedding Benchmark) project. It allows annotators to evaluate the relevance of documents for reranking tasks.

How to use

Navigate to the "Demo" tab to try the interface with an example dataset (AskUbuntuDupQuestions)
Read the query at the top
For each document, assign a rank using the dropdown (1 = most relevant)
Submit your rankings
Navigate between samples using the Previous/Next buttons
Your annotations are saved automatically

About MTEB Human Evaluation

This project aims to establish human performance benchmarks for MTEB tasks, helping to understand the realistic "ceiling" for embedding model performance.