Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
sh110495 's Collections
RL
Long Context
Interested
Evaluation
Data Selection

Evaluation

updated Jul 1, 2024
Upvote
-

  • MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark

    Paper • 2406.01574 • Published Jun 3, 2024 • 47

  • LiveBench: A Challenging, Contamination-Free LLM Benchmark

    Paper • 2406.19314 • Published Jun 27, 2024 • 23
Upvote
-
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs