Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Paper • 2504.17789 • Published 14 days ago • 23
Skywork R1V2: Multimodal Hybrid Reinforcement Learning for Reasoning Paper • 2504.16656 • Published 15 days ago • 53
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark Paper • 2504.16427 • Published 15 days ago • 17
Unicorn: Text-Only Data Synthesis for Vision Language Model Training Paper • 2503.22655 • Published Mar 28 • 38
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published about 1 month ago • 180
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 24 days ago • 255