InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 24 days ago • 255
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 10 days ago • 463
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 124
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published Mar 17 • 30
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published Jan 13 • 98
Migician: Revealing the Magic of Free-Form Multi-Image Grounding in Multimodal Large Language Models Paper • 2501.05767 • Published Jan 10 • 30