-
Med-RLVR: Emerging Medical Reasoning from a 3B base model via reinforcement Learning
Paper • 2502.19655 • Published -
MedVLM-R1: Incentivizing Medical Reasoning Capability of Vision-Language Models (VLMs) via Reinforcement Learning
Paper • 2502.19634 • Published • 63 -
R1-T1: Fully Incentivizing Translation Capability in LLMs via Reasoning Learning
Paper • 2502.19735 • Published • 9 -
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 14
Collections
Discover the best community collections!
Collections including paper arxiv:2502.14669
-
AlphaMaze: Enhancing Large Language Models' Spatial Intelligence via GRPO
Paper • 2502.14669 • Published • 14 -
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Paper • 2503.05592 • Published • 27 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 39 -
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement
Paper • 2503.17352 • Published • 23
-
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters
Paper • 2408.03314 • Published • 63 -
TAG: A Decentralized Framework for Multi-Agent Hierarchical Reinforcement Learning
Paper • 2502.15425 • Published • 9 -
EgoLife: Towards Egocentric Life Assistant
Paper • 2503.03803 • Published • 43 -
Visual-RFT: Visual Reinforcement Fine-Tuning
Paper • 2503.01785 • Published • 78
-
Towards Large Reasoning Models: A Survey of Reinforced Reasoning with Large Language Models
Paper • 2501.09686 • Published • 41 -
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Paper • 2501.12948 • Published • 391 -
Chain-of-Retrieval Augmented Generation
Paper • 2501.14342 • Published • 57 -
RL + Transformer = A General-Purpose Problem Solver
Paper • 2501.14176 • Published • 28
-
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via Collective Monte Carlo Tree Search
Paper • 2412.18319 • Published • 40 -
Token-Budget-Aware LLM Reasoning
Paper • 2412.18547 • Published • 47 -
Efficiently Serving LLM Reasoning Programs with Certaindex
Paper • 2412.20993 • Published • 38 -
B-STaR: Monitoring and Balancing Exploration and Exploitation in Self-Taught Reasoners
Paper • 2412.17256 • Published • 48