General-Reasoner Collection Advancing LLMs' general reasoning capabilities • 5 items • Updated about 6 hours ago
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper • 2505.02735 • Published 3 days ago • 24
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper • 2504.15415 • Published 17 days ago • 22
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper • 2504.11536 • Published 23 days ago • 60