Benchmarking Multimodal Mathematical Reasoning with Explicit Visual Dependency Paper β’ 2504.18589 β’ Published 14 days ago β’ 10
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper β’ 2504.15271 β’ Published 17 days ago β’ 65
meta-llama/Llama-4-Scout-17B-16E-Instruct Image-Text-to-Text β’ Updated 29 days ago β’ 829k β’ β’ 875
Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy Paper β’ 2503.19757 β’ Published Mar 25 β’ 50
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks Paper β’ 2503.21696 β’ Published Mar 27 β’ 22