FlowReasoner: Reinforcing Query-Level Meta-Agents Paper β’ 2504.15257 β’ Published 17 days ago β’ 46
NoisyRollout: Reinforcing Visual Reasoning with Data Augmentation Paper β’ 2504.13055 β’ Published 21 days ago β’ 19
SCITAT: A Question Answering Benchmark for Scientific Tables and Text Covering Diverse Reasoning Types Paper β’ 2412.11757 β’ Published Dec 16, 2024
Efficient Process Reward Model Training via Active Learning Paper β’ 2504.10559 β’ Published 24 days ago β’ 13
π Active PRM Collection Efficient Process Reward Model Training via Active Learning. β’ 4 items β’ Updated 23 days ago β’ 3
π Active PRM Collection Efficient Process Reward Model Training via Active Learning. β’ 4 items β’ Updated 23 days ago β’ 3