InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 24 days ago • 255
Qwen2.5-VL Collection Vision-language model series based on Qwen2.5 • 11 items • Updated 10 days ago • 463
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published Mar 17 • 30
DAPO: An Open-Source LLM Reinforcement Learning System at Scale Paper • 2503.14476 • Published Mar 18 • 124
DeepPerception: Advancing R1-like Cognitive Visual Perception in MLLMs for Knowledge-Intensive Visual Grounding Paper • 2503.12797 • Published Mar 17 • 30