AgentRewardBench: Evaluating Automatic Evaluations of Web Agent Trajectories Paper β’ 2504.08942 β’ Published 27 days ago β’ 27