Submitted by AngLv 58 The Climb Carves Wisdom Deeper Than the Summit: On the Noisy Rewards in Learning to Reason · 5 authors 2
Submitted by Liuff23 57 Spatial-MLLM: Boosting MLLM Capabilities in Visual-based Spatial Intelligence · 4 authors 2
Submitted by songtingyu 51 VF-Eval: Evaluating Multimodal LLMs for Generating Feedback on AIGC Videos · 4 authors 2
Submitted by lyx97 31 VideoReasonBench: Can MLLMs Perform Vision-Centric Complex Video Reasoning? · 10 authors 6
Submitted by maksimko123 23 cadrille: Multi-modal CAD Reconstruction with Online Reinforcement Learning · 9 authors 2
Submitted by lhjiang 21 AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views · 12 authors 2
Submitted by chaoscodes 21 Satori-SWE: Evolutionary Test-Time Scaling for Sample-Efficient Software Engineering · 11 authors 2
Submitted by shizhediao 20 Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding · 9 authors 2
Submitted by benzweijia 19 UniRL: Self-Improving Unified Multimodal Models via Supervised and Reinforcement Learning · 3 authors 2
Submitted by sy1998 19 VidText: Towards Comprehensive Evaluation for Video Text Understanding · 10 authors 2
Submitted by spapi 18 FAMA: The First Large-Scale Open-Science Speech Foundation Model for English and Italian · 9 authors 2
Submitted by dlaptev 17 Train Sparse Autoencoders Efficiently by Utilizing Features Correlation · 5 authors 1
Submitted by ydalva 15 LoRAShop: Training-Free Multi-Concept Image Generation and Editing with Rectified Flow Transformers · 3 authors 2
Submitted by AliBehrouz 15 ATLAS: Learning to Optimally Memorize the Context at Test Time · 8 authors 2
Submitted by TharinduSK 14 Towards Safety Reasoning in LLMs: AI-agentic Deliberation for Policy-embedded CoT Data Creation · 9 authors 2
Submitted by BryanW 11 Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model · 11 authors 3
Submitted by antonio-c 10 GeoDrive: 3D Geometry-Informed Driving World Model with Precise Action Control · 8 authors 3
Submitted by KunlunZhu 9 SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents · 9 authors 2
Submitted by Jiahao004 8 DeepTheorem: Advancing LLM Reasoning for Theorem Proving Through Natural Language and Reinforcement Learning · 13 authors 2
Submitted by Jang-Hyun 8 KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction · 6 authors 2
Submitted by m-serious 8 ToMAP: Training Opponent-Aware LLM Persuaders with Theory of Mind · 3 authors 2
Submitted by dek924 8 PatientSim: A Persona-Driven Simulator for Realistic Doctor-Patient Interactions · 8 authors 2
Submitted by jefflai 7 Breaking Down Video LLM Benchmarks: Knowledge, Spatial Perception, or True Temporal Understanding? · 7 authors 1
Submitted by Elfsong 6 Afterburner: Reinforcement Learning Facilitates Self-Improving Code Efficiency Optimization · 9 authors 2
Submitted by smallAI 6 Uni-Instruct: One-step Diffusion Model through Unified Diffusion Divergence Instruction · 6 authors 2
Submitted by Bang-UdeM-Mila 5 System-1.5 Reasoning: Traversal in Language and Latent Spaces with Dynamic Shortcuts · 4 authors 2
Submitted by ttumyche 5 CXReasonBench: A Benchmark for Evaluating Structured Diagnostic Reasoning in Chest X-rays · 6 authors 2
Submitted by lyxun 4 UniTEX: Universal High Fidelity Generative Texturing for 3D Shapes · 8 authors 2
Submitted by crc5577 4 Re-ttention: Ultra Sparse Visual Generation via Attention Statistical Reshape · 5 authors 2
Submitted by davidchan 3 Puzzled by Puzzles: When Vision-Language Models Can't Take a Hint · 6 authors 2
Submitted by hdong51 3 To Trust Or Not To Trust Your Vision-Language Model's Prediction · 5 authors 2
Submitted by angtian 3 ATI: Any Trajectory Instruction for Controllable Video Generation · 5 authors 1
Submitted by ahnpersie 3 Can LLMs Deceive CLIP? Benchmarking Adversarial Compositionality of Pre-trained Multimodal Representation via Text Updates · 4 authors 4
Submitted by JRQi 3 When Models Reason in Your Language: Controlling Thinking Trace Language Comes at the Cost of Accuracy · 6 authors 2
Submitted by kornelhowil 3 CLIPGaussian: Universal and Multimodal Style Transfer Based on Gaussian Splatting · 6 authors 2
Submitted by JingzeShi 3 Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting · 7 authors 1
Submitted by Franck-Dernoncourt 3 A Graph Perspective to Probe Structural Patterns of Knowledge in Large Language Models · 9 authors 2
Submitted by yunjae-won 2 Differential Information: An Information-Theoretic Perspective on Preference Optimization · 4 authors 2
Submitted by lhmd 2 ZPressor: Bottleneck-Aware Compression for Scalable Feed-Forward 3DGS · 6 authors 3
Submitted by kpzhang996 2 SridBench: Benchmark of Scientific Research Illustration Drawing of Image Generation Model · 7 authors 2
Submitted by Aman 2 Evaluating Text Creativity across Diverse Domains: A Dataset and Large Language Model Evaluator · 6 authors 2
Submitted by Junfeng5 2 TokBench: Evaluating Your Visual Tokenizer before Visual Generation · 9 authors 2
Submitted by StringChaos 1 GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents · 6 authors 2
Submitted by gsarti 1 Unsupervised Word-level Quality Estimation for Machine Translation Through the Lens of Annotators (Dis)agreement · 4 authors 2
Submitted by SuperSupermoon 1 Lunguage: A Benchmark for Structured and Sequential Chest X-ray Interpretation · 13 authors 2
Submitted by pengxiang 1 Adaptive Classifier-Free Guidance via Dynamic Low-Confidence Masking · 7 authors 2
Submitted by ctma 1 Large Language Models Meet Knowledge Graphs for Question Answering: Synthesis and Opportunities · 5 authors 1
Submitted by TeddyXGZ - Toward Reliable Biomedical Hypothesis Generation: Evaluating Truthfulness and Hallucination in Large Language Models · 8 authors 2