Skill Discovery for Software Scripting Automation via Offline Simulations with LLMs Paper • 2504.20406 • Published 9 days ago • 6
AdaR1: From Long-CoT to Hybrid-CoT via Bi-Level Adaptive Reasoning Optimization Paper • 2504.21659 • Published 8 days ago • 9
LLMs for Engineering: Teaching Models to Design High Powered Rockets Paper • 2504.19394 • Published 11 days ago • 12
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks Paper • 2505.00234 • Published 7 days ago • 21
DeepCritic: Deliberate Critique with Large Language Models Paper • 2505.00662 • Published 7 days ago • 48
WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published 8 days ago • 41
OpenMathReasoning Collection Models and datasets from "AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset" • 7 items • Updated 3 days ago • 35
VisuLogic: A Benchmark for Evaluating Visual Reasoning in Multi-modal Large Language Models Paper • 2504.15279 • Published 17 days ago • 73
Progent: Programmable Privilege Control for LLM Agents Paper • 2504.11703 • Published 22 days ago • 7
LLMs are Greedy Agents: Effects of RL Fine-tuning on Decision-Making Abilities Paper • 2504.16078 • Published 16 days ago • 20
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation Paper • 2504.14538 • Published 18 days ago • 27
Learning Adaptive Parallel Reasoning with Language Models Paper • 2504.15466 • Published 17 days ago • 42
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation Paper • 2504.15254 • Published 17 days ago • 6
Rethinking the Generation of High-Quality CoT Data from the Perspective of LLM-Adaptive Question Difficulty Grading Paper • 2504.11919 • Published 22 days ago • 12
A Comprehensive Survey in LLM(-Agent) Full Stack Safety: Data, Training and Deployment Paper • 2504.15585 • Published 16 days ago • 13
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset Paper • 2504.16891 • Published 15 days ago • 18
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model Paper • 2504.15843 • Published 16 days ago • 18