Submitted by runninglsy 57 Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities · 10 authors 4
Submitted by SpaceProduct 34 ZeroSearch: Incentivize the Search Capability of LLMs without Searching · 9 authors 2
Submitted by Gracjan 19 Beyond Recognition: Evaluating Visual Perspective Taking in Vision Language Models · 6 authors 1
Submitted by albertge 17 R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training · 10 authors 1
Submitted by BestWishYsh 15 HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation · 7 authors 2
Submitted by hyz317 13 PrimitiveAnything: Human-Crafted 3D Primitive Assembly Generation with Auto-Regressive Transformer · 8 authors 1
Submitted by renqiux0302 8 Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving · 6 authors 1
Submitted by itaowe 6 OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution · 10 authors 1
Submitted by huangsiteng 6 OpenHelix: A Short Survey, Empirical Analysis, and Open-Source Dual-System VLA Model for Robotic Manipulation · 13 authors 1
Submitted by PahaII 5 OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning · 5 authors 1
Submitted by mariya-davydova 4 OSUniverse: Benchmark for Multimodal GUI-navigation AI Agents · 5 authors 1
Submitted by Ningyu 4 Knowledge Augmented Complex Problem Solving with Large Language Models: A Survey · 9 authors 1
Submitted by VityaVitalich 3 LLM-Independent Adaptive RAG: Let the Question Speak for Itself · 9 authors 1
Submitted by Tournesol-Saturday 2 RAIL: Region-Aware Instructive Learning for Semi-Supervised Tooth Segmentation in CBCT · 7 authors 1
Submitted by linxule 1 Cognitio Emergens: Agency, Dimensions, and Dynamics in Human-AI Knowledge Co-Creation · 1 authors 1
Submitted by Eavn 1 Uncertainty-Weighted Image-Event Multimodal Fusion for Video Anomaly Detection · 3 authors 1