Beyond One-Size-Fits-All: Inversion Learning for Highly Effective NLG Evaluation Prompts Paper β’ 2504.21117 β’ Published 9 days ago β’ 23
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper β’ 2505.02735 β’ Published 3 days ago β’ 24
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper β’ 2505.02735 β’ Published 3 days ago β’ 24
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper β’ 2505.02735 β’ Published 3 days ago β’ 24
FormalMATH: Benchmarking Formal Mathematical Reasoning of Large Language Models Paper β’ 2505.02735 β’ Published 3 days ago β’ 24
OTC: Optimal Tool Calls via Reinforcement Learning Paper β’ 2504.14870 β’ Published 18 days ago β’ 33
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper β’ 2504.15415 β’ Published 17 days ago β’ 22
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper β’ 2504.15415 β’ Published 17 days ago β’ 22
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper β’ 2504.15415 β’ Published 17 days ago β’ 22
AutoKaggle: A Multi-Agent Framework for Autonomous Data Science Competitions Paper β’ 2410.20424 β’ Published Oct 27, 2024 β’ 41
IV-Bench: A Benchmark for Image-Grounded Video Perception and Reasoning in Multimodal LLMs Paper β’ 2504.15415 β’ Published 17 days ago β’ 22
Analyzing LLMs' Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations Paper β’ 2504.13816 β’ Published 20 days ago β’ 17
Kimina-Prover Preview: Towards Large Formal Reasoning Models with Reinforcement Learning Paper β’ 2504.11354 β’ Published 23 days ago β’ 2
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper β’ 2504.11536 β’ Published 23 days ago β’ 60
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper β’ 2504.11536 β’ Published 23 days ago β’ 60