Leaderboards 🔥 - a sugatoray Collection

sugatoray 's Collections

Papers + RL/Reasoning

Marimo

RLMs (Reasoning Language Models)

Books And Notes

Reasoning Datasets

SmolAgents Tools (Spaces)

Bookmark::Models

LLMs

AV LLMs

LLM Training Datasets

Papers

Leaderboards 🔥

Papers-Fundamentals

TFM: TimeSeries Foundation Models

Papers-Benchmarks

LLMs-EmbeddingModels

LLM + Datasets : Finance

Leaderboards 🔥

updated Mar 12

A collection of Leaderboards for LLMs ⚡️⚖️ 🤗

Running

4.36k

4.36k

Chatbot Arena Leaderboard

🏆

Display chatbot performance leaderboard
Running on CPU Upgrade

13k

13k

Open LLM Leaderboard

🏆

Track, rank and evaluate open LLMs and chatbots
Running

188

188

Yet Another LLM Leaderboard

🌖

Run a Streamlit web app
Running on CPU Upgrade

138

138

Hallucinations Leaderboard

🔥

View and submit LLM evaluations
Running

483

483

LLM-Perf Leaderboard

🏆

Explore LLM performance across hardware
Running on CPU Upgrade

91

91

LLM Safety Leaderboard

🥇

View and submit machine learning model evaluations
Running

223

223

AI2 WildBench Leaderboard (V2)

🦁

Display and explore model leaderboards and chat history
Runtime error

30

30

Contextual Leaderboard

🐨
Running on CPU Upgrade

5.6k

5.6k

MTEB Leaderboard

🥇

Embedding Leaderboard
Running on CPU Upgrade

54

54

Open CoT Leaderboard

🥇

Track, rank and evaluate open LLMs' CoT quality
Running

320

320

LLM Performance Leaderboard

🐨

View LLM Performance Leaderboard
Running

203

203

BigCodeBench Leaderboard

🥇

Explore and analyze code evaluation data
Running

66

66

The timm Leaderboard

🏆

Display and analyze PyTorch Image Models leaderboard
Running

80

80

Open FinLLM Leaderboard

🥇

Browse and submit large language model evaluations
Running

106

106

Open VLM Video Leaderboard

🌎

VLMEvalKit Eval Results in video understanding benchmark
Running

44

44

MEGA-Bench Leaderboard

🥇

A leaderboard for multimodal models
Running on CPU Upgrade

92

92

Open LLM Leaderboard Model Comparator

🏆

Compare Open LLM Leaderboard results
Running

134

134

Vidore Leaderboard

🥇

Browse and submit visual document retrieval benchmark results
Running

101

101

Judge Arena

💻

Vote on AI responses to rank models
Running on CPU Upgrade

738

738

Open VLM Leaderboard

🌎

VLMEvalKit Evaluation Results Collection
Paused

9

9

Keras Chatbot Battle

💬

Interact with multiple chatbots simultaneously
Sleeping

4

4

OmniEval

🥇
Running

5

5

OmniEval

🥇

Official Leaderboard for OmniEval
open-llm-leaderboard/contents

Viewer • Updated Mar 20 • 4.58k • 9.99k • 15
Running on CPU Upgrade

417

417

GAIA Leaderboard

🦾

Submit models for evaluation and view leaderboard results
m-ric/agents_small_benchmark

Viewer • Updated Jan 19, 2024 • 100 • 63 • 10
Running on Zero

358

358

TTS Spaces Arena

🤗

Blind vote on HF TTS models!
Running

106

106

MTEB Arena

⚔

Display a machine translation evaluation interface
Running on Zero

281

281

GenAI Arena

📈

Realtime Image/Video Gen AI Arena
Running on CPU Upgrade

298

298

Agent Leaderboard

💬

Ranking of LLMs for agentic tasks
Running on CPU Upgrade

796

796

Open ASR Leaderboard

🏆

Request evaluation for new speech models
Running

36

36

Open LMM Reasoning Leaderboard

🥇

A Leaderboard that demonstrates LMM reasoning capabilities
Running

127

127

smolagents LLM leaderboard

🏆

A leaderboard for LLMs powering smolagents
smolagents/benchmark-v1

Viewer • Updated Mar 4 • 132 • 733 • 12