Spaces:
Sleeping
Sleeping
metadata
title: MTEB Human Evaluation Demo
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.23.3
app_file: app.py
pinned: false
MTEB Human Evaluation Demo
This is a demo of the human evaluation interface for the MTEB (Massive Text Embedding Benchmark) project. It allows annotators to evaluate the relevance of documents for reranking tasks.
How to use
- Navigate to the "Demo" tab to try the interface with an example dataset (AskUbuntuDupQuestions)
- Read the query at the top
- For each document, assign a rank using the dropdown (1 = most relevant)
- Submit your rankings
- Navigate between samples using the Previous/Next buttons
- Your annotations are saved automatically
About MTEB Human Evaluation
This project aims to establish human performance benchmarks for MTEB tasks, helping to understand the realistic "ceiling" for embedding model performance.