SuperGPQA: Scaling LLM Evaluation across 285 Graduate Disciplines Paper • 2502.14739 • Published Feb 20 • 103
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21 • 66
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 98
Enabling Scalable Oversight via Self-Evolving Critic Paper • 2501.05727 • Published Jan 10 • 75
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings Paper • 2501.01257 • Published Jan 2 • 53
Evaluating and Aligning CodeLLMs on Human Preference Paper • 2412.05210 • Published Dec 6, 2024 • 51
ProcessBench: Identifying Process Errors in Mathematical Reasoning Paper • 2412.06559 • Published Dec 9, 2024 • 83
A Unified View of Delta Parameter Editing in Post-Trained Large-Scale Models Paper • 2410.13841 • Published Oct 17, 2024 • 17
I-SHEEP: Self-Alignment of LLM from Scratch through an Iterative Self-Enhancement Paradigm Paper • 2408.08072 • Published Aug 15, 2024 • 36
Nemotron 4 340B Collection Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 3 days ago • 162
Weak-to-Strong Extrapolation Expedites Alignment Paper • 2404.16792 • Published Apr 25, 2024 • 11