Submitted by QiushiSun 82 ScienceBoard: Evaluating Multimodal Autonomous Agents in Realistic Scientific Workflows · 21 authors 1
Submitted by JiakangYuan 73 MME-Reasoning: A Comprehensive Benchmark for Logical Reasoning in MLLMs · 11 authors 2
Submitted by KevinQHLin 69 Paper2Poster: Towards Multimodal Poster Automation from Scientific Papers · 5 authors 1
Submitted by yiren98 57 OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data · 3 authors 1
Submitted by BestWishYsh 49 OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation · 9 authors 2
Submitted by MiniMax-AI 42 SynLogic: Synthesizing Verifiable Reasoning Data at Scale for Learning Logical Reasoning and Beyond · 15 authors 1
Submitted by glebzok 39 Exploring the Latent Capacity of LLMs for One-Step Text Generation · 2 authors 1
Submitted by AmirhoseinGH 39 Guided by Gut: Efficient Test-Time Scaling with Reinforced Intrinsic Confidence · 4 authors 1
Submitted by hassid 39 Don't Overthink it. Preferring Shorter Thinking Chains for Improved LLM Reasoning · 4 authors 2
Submitted by YunxinLi 35 VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization · 8 authors 3
Submitted by AJZhou 34 UI-Genie: A Self-Improving Approach for Iteratively Boosting MLLM-based Mobile GUI Agents · 15 authors 1
Submitted by xihc-ucb 34 Sparse VideoGen2: Accelerate Video Generation with Sparse Attention via Semantic-Aware Permutation · 14 authors 1
Submitted by DogNeverSleep 30 MME-VideoOCR: Evaluating OCR-Based Capabilities of Multimodal LLMs in Video Scenarios · 18 authors 1
Submitted by HyungjunKim 29 GraLoRA: Granular Low-Rank Adaptation for Parameter-Efficient Fine-Tuning · 5 authors 1
Submitted by Howe666 26 Video-Holmes: Can MLLM Think Like Holmes for Complex Video Reasoning? · 6 authors 1
Submitted by XUANMINGZHANG 23 MetaMind: Modeling Human Social Thoughts with Metacognitive Multi-Agent Systems · 4 authors 3
Submitted by lynazhang 22 rStar-Coder: Scaling Competitive Code Reasoning with a Large-Scale Verified Dataset · 8 authors 2
Submitted by FSCCS 17 HoliTom: Holistic Token Merging for Fast Video Large Language Models · 6 authors 1
Submitted by che111 17 Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL · 9 authors 1
Submitted by che111 16 NOVA: A Benchmark for Anomaly Localization and Clinical Reasoning in Brain MRI · 15 authors 1
Submitted by Shimao-Zhang 15 How does Alignment Enhance LLMs' Multilingual Capabilities? A Language Neurons Perspective · 8 authors 1
Submitted by Ningyu 13 Beyond Prompt Engineering: Robust Behavior Control in LLMs via Steering Target Atoms · 7 authors 1
Submitted by ariondas 13 SweEval: Do LLMs Really Swear? A Safety Benchmark for Testing Limits for Enterprise Use · 9 authors 2
Submitted by HikariDawn 12 Frame In-N-Out: Unbounded Controllable Image-to-Video Generation · 4 authors 1
Submitted by Z-MU-Z 11 Active-O3: Empowering Multimodal Large Language Models with Active Perception via GRPO · 11 authors 1
Submitted by Geralt-Targaryen 11 Code Graph Model (CGM): A Graph-Integrated Large Language Model for Repository-Level Software Engineering Tasks · 15 authors 1
Submitted by tricktreat 10 ViewSpatial-Bench: Evaluating Multi-perspective Spatial Localization in Vision-Language Models · 12 authors 1
Submitted by leo1117 10 DetailFlow: 1D Coarse-to-Fine Autoregressive Image Generation via Next-Detail Prediction · 13 authors 1
Submitted by judge 8 SeePhys: Does Seeing Help Thinking? -- Benchmarking Vision-Based Physics Reasoning · 14 authors 2
Submitted by joanrodai 7 Rendering-Aware Reinforcement Learning for Vector Graphics Generation · 15 authors 1
Submitted by zzwustc 7 MotionPro: A Precise Motion Controller for Image-to-Video Generation · 7 authors 2
Submitted by yushihu 7 MMMG: a Comprehensive and Reliable Evaluation Suite for Multitask Multimodal Generation · 12 authors 1
Submitted by jiaxiaojunQAQ 6 Adversarial Attacks against Closed-Source MLLMs via Feature Optimal Alignment · 10 authors 1
Submitted by Nickyang 5 Walk Before You Run! Concise LLM Reasoning via Reinforcement Learning · 2 authors 1
Submitted by YanAdjeNole 5 FinTagging: An LLM-ready Benchmark for Extracting and Structuring Financial Information · 10 authors 1
Submitted by yunlong10 5 MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness · 14 authors
Submitted by pprp 5 Can Compressed LLMs Truly Act? An Empirical Evaluation of Agentic Capabilities in LLM Compression · 6 authors 1
Submitted by ofirpress 5 VideoGameBench: Can Vision-Language Models complete popular video games? · 4 authors 2
Submitted by yrshi 5 Search and Refine During Think: Autonomous Retrieval-Augmented Reasoning of LLMs · 8 authors 1
Submitted by ZhangShenao 4 Beyond Markovian: Reflective Exploration via Bayes-Adaptive RL for LLM Reasoning · 8 authors 1
Submitted by yjlee0222 4 VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection · 7 authors 1
Submitted by westbrook 4 SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline · 10 authors 1
Submitted by mkoretsky1 4 BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases · 11 authors 1
Submitted by EliverQ 4 R1-Searcher++: Incentivizing the Dynamic Knowledge Acquisition of LLMs via Reinforcement Learning · 10 authors 1
Submitted by zhennan1 3 Scaling External Knowledge Input Beyond Context Windows of LLMs via Multi-Agent Collaboration · 7 authors 1
Submitted by Neo111x 3 DFIR-Metric: A Benchmark Dataset for Evaluating Large Language Models in Digital Forensics and Incident Response · 6 authors 1
Submitted by NicerWang 2 AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery · 8 authors 1
Submitted by ChrisJuan 2 Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution · 18 authors 1
Submitted by friedrichor 2 Modality Curation: Building Universal Embeddings for Advanced Multimodal Information Retrieval · 11 authors 1
Submitted by Xxlbigbrother 2 ComfyMind: Toward General-Purpose Generation via Tree-Based Planning and Reactive Feedback · 8 authors 2
Submitted by HuanjinYao 2 R1-ShareVL: Incentivizing Reasoning Capability of Multimodal Large Language Models via Share-GRPO · 11 authors 1
Submitted by xianghuang 1 Reverse Preference Optimization for Complex Instruction Following · 8 authors 1
Submitted by davidelobba 1 Inverse Virtual Try-On: Generating Multi-Category Product-Style Images from Clothed Individuals · 6 authors 1
Submitted by nielsr 1 VLM-3R: Vision-Language Models Augmented with Instruction-Aligned 3D Reconstruction · 17 authors 1
Submitted by Qinsi1 1 CoreMatching: A Co-adaptive Sparse Inference Framework with Token and Neuron Pruning for Comprehensive Acceleration of Vision-Language Models · 9 authors 1
Submitted by justairr 1 SATORI-R1: Incentivizing Multimodal Reasoning with Spatial Grounding and Verifiable Rewards · 4 authors 1
Submitted by Baran47 1 Tropical Attention: Neural Algorithmic Reasoning for Combinatorial Algorithms · 4 authors 1
Submitted by hazemessam - Ankh3: Multi-Task Pretraining with Sequence Denoising and Completion Enhances Protein Representations · 4 authors 1
Submitted by hazemessam - Beyond Simple Concatenation: Fairly Assessing PLM Architectures for Multi-Chain Protein-Protein Interactions Prediction · 8 authors 1
Submitted by andrewzamai - An Explainable Diagnostic Framework for Neurodegenerative Dementias via Reinforcement-Optimized LLM Reasoning · 6 authors 1
Submitted by Eleven-P - PreMoe: Lightening MoEs on Constrained Memory by Expert Pruning and Retrieval · 8 authors 1