Submitted by roadjiang 123 Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model · 54 authors 12
Submitted by YuuTennYi 47 GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation · 5 authors 2
Submitted by BestWishYsh 39 MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft · 7 authors 3
Submitted by tianchez 31 VLM-R1: A Stable and Generalizable R1-style Large Vision-Language Model · 12 authors 2
Submitted by ZhuangXialie 27 SQL-R1: Training Natural Language to SQL Reasoning Model By Reinforcement Learning · 6 authors 2
Submitted by yeates 18 ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration · 10 authors 2
Submitted by BestWishYsh 12 FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation · 4 authors 2
Submitted by akhaliq 11 Visual Chronicles: Using Multimodal LLMs to Analyze Massive Collections of Images · 7 authors 2
Submitted by DannyLan 11 Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models · 4 authors 6
Submitted by stefan-it 10 ModernBERT or DeBERTaV3? Examining Architecture and Data Influence on Transformer Encoder Models Performance · 3 authors 3
Submitted by sauradip 10 In-2-4D: Inbetweening from Two Single-View Images to 4D Generation · 4 authors 2
Submitted by AdinaY 10 Pangu Ultra: Pushing the Limits of Dense Large Language Models on Ascend NPUs · 52 authors 3
Submitted by jialuliluka 7 Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization · 6 authors 2
Submitted by nielsr 7 UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation · 3 authors 2
Submitted by richard-guyunqi 6 BlenderGym: Benchmarking Foundational Model Systems for Graphics Editing · 5 authors 2
Submitted by gabrielelozupone98 5 Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging · 6 authors 2
Submitted by ruipeterpan 5 SpecReason: Fast and Accurate Inference-Time Compute via Speculative Reasoning · 6 authors 2
Submitted by saidwivedi 5 InteractVLM: 3D Interaction Reasoning from 2D Foundational Models · 7 authors 2
Submitted by aashiqmuhamed 4 SAEs Can Improve Unlearning: Dynamic Sparse Autoencoder Guardrails for Precision Unlearning in LLMs · 4 authors 2