Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
McGill-NLP 's Collections
Unequal unlearning
AgentRewardBench
Malicious-IR
SafeArena
CHASE
LLM2Vec
WebLINX
AURORA
WebLINX Models
Statcan Dialogue Dataset & Models
FaithDial
MLQuestions

AgentRewardBench

updated 24 days ago
Upvote
1

  • AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories

    Paper β€’ 2504.08942 β€’ Published 27 days ago β€’ 27

  • McGill-NLP/agent-reward-bench

    Viewer β€’ Updated 18 days ago β€’ 1.41k β€’ 3.55k β€’ 2

  • Running
    4
    4

    Agent Reward Bench Demo

    πŸ’»

    Visualize agent interactions with WebArena tasks


  • Running

    Agent Reward Bench Leaderboard

    πŸ₯‡

    Leaderboard for AgentRewardBench

Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs