-
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model
Paper • 2503.24290 • Published • 62 -
I Have Covered All the Bases Here: Interpreting Reasoning Features in Large Language Models via Sparse Autoencoders
Paper • 2503.18878 • Published • 118 -
START: Self-taught Reasoner with Tools
Paper • 2503.04625 • Published • 111 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 124
Collections
Discover the best community collections!
Collections including paper arxiv:2504.10479
-
Gemma 3 Technical Report
Paper • 2503.19786 • Published • 50 -
Kimi-VL Technical Report
Paper • 2504.07491 • Published • 125 -
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Paper • 2504.10479 • Published • 255 -
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Paper • 2504.09925 • Published • 38
-
R1-Onevision: Advancing Generalized Multimodal Reasoning through Cross-Modal Formalization
Paper • 2503.10615 • Published • 17 -
UniGoal: Towards Universal Zero-shot Goal-oriented Navigation
Paper • 2503.10630 • Published • 6 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 30 -
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 86
-
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Paper • 2503.07536 • Published • 86 -
Seedream 2.0: A Native Chinese-English Bilingual Image Generation Foundation Model
Paper • 2503.07703 • Published • 36 -
Gemini Embedding: Generalizable Embeddings from Gemini
Paper • 2503.07891 • Published • 38 -
Optimizing Test-Time Compute via Meta Reinforcement Fine-Tuning
Paper • 2503.07572 • Published • 44
-
Towards General-Purpose Model-Free Reinforcement Learning
Paper • 2501.16142 • Published • 30 -
DAPO: An Open-Source LLM Reinforcement Learning System at Scale
Paper • 2503.14476 • Published • 124 -
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model?
Paper • 2504.13837 • Published • 119 -
Learning to Reason under Off-Policy Guidance
Paper • 2504.14945 • Published • 80
-
OmniEval: An Omnidirectional and Automatic RAG Evaluation Benchmark in Financial Domain
Paper • 2412.13018 • Published • 42 -
Retrieval-augmented Large Language Models for Financial Time Series Forecasting
Paper • 2502.05878 • Published • 41 -
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
Paper • 2502.06772 • Published • 21 -
ELTEX: A Framework for Domain-Driven Synthetic Data Generation
Paper • 2503.15055 • Published • 6
-
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper • 2412.10360 • Published • 146 -
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization
Paper • 2501.01245 • Published • 5 -
VideoRefer Suite: Advancing Spatial-Temporal Object Understanding with Video LLM
Paper • 2501.00599 • Published • 48 -
Omni-RGPT: Unifying Image and Video Region-level Understanding via Token Marks
Paper • 2501.08326 • Published • 35
-
MIA-DPO: Multi-Image Augmented Direct Preference Optimization For Large Vision-Language Models
Paper • 2410.17637 • Published • 37 -
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Paper • 2411.10442 • Published • 81 -
Critic-V: VLM Critics Help Catch VLM Errors in Multimodal Reasoning
Paper • 2411.18203 • Published • 37 -
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
Paper • 2411.14432 • Published • 26
-
Differential Transformer
Paper • 2410.05258 • Published • 178 -
PaliGemma 2: A Family of Versatile VLMs for Transfer
Paper • 2412.03555 • Published • 134 -
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper • 2412.04467 • Published • 111 -
o1-Coder: an o1 Replication for Coding
Paper • 2412.00154 • Published • 45
-
LLM Pruning and Distillation in Practice: The Minitron Approach
Paper • 2408.11796 • Published • 59 -
TableBench: A Comprehensive and Complex Benchmark for Table Question Answering
Paper • 2408.09174 • Published • 53 -
To Code, or Not To Code? Exploring Impact of Code in Pre-training
Paper • 2408.10914 • Published • 43 -
Open-FinLLMs: Open Multimodal Large Language Models for Financial Applications
Paper • 2408.11878 • Published • 60