File size: 1,367 Bytes
7a801b3
 
bbc07ec
7a801b3
 
 
 
 
 
bbc07ec
 
 
 
 
 
753edf8
6699159
06a0339
 
 
6699159
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
---
title: README
emoji: ⚖️
colorFrom: blue
colorTo: indigo
sdk: static
pinned: false
---

Hi! Welcome on the org page of the Evaluation team at HuggingFace.
We want to support the community in building and sharing quality evaluations, for reproducible and fair model comparisions, to cut through the hype of releases and better understand actual model capabilities.

We're behind the:
- [lighteval](https://github.com/huggingface/lighteval) LLM evaluation suite, fast and filled with the SOTA benchmarks you might want  
- [evaluation guidebook](https://github.com/huggingface/evaluation-guidebook), your reference for LLM evals 
- [leaderboards on the hub](https://huggingface.co/blog?tag=leaderboard) initiative, to encourage people to build more leaderboards in the open for more reproducible evaluation. You'll find some doc [here](https://huggingface.co/docs/leaderboards/index) to build your own, and you can look for the best leaderboard for your use case [here](https://huggingface.co/spaces/OpenEvals/find-a-leaderboard)!

Our archived projects:
- [Open LLM Leaderboard](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard#/) (over 11K models evaluated since 2023)

We're not behind the [evaluate metrics guide](https://huggingface.co/evaluate-metric) but if you want to understand metrics better we really recommend checking it out!