MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published 27 days ago • 39
Scaling Laws for Native Multimodal Models Scaling Laws for Native Multimodal Models Paper • 2504.07951 • Published 28 days ago • 27
AccVideo: Accelerating Video Diffusion Model with Synthetic Dataset Paper • 2503.19462 • Published Mar 25 • 10
VBench: Comprehensive Benchmark Suite for Video Generative Models Paper • 2311.17982 • Published Nov 29, 2023 • 9
SigLIP 2: Multilingual Vision-Language Encoders with Improved Semantic Understanding, Localization, and Dense Features Paper • 2502.14786 • Published Feb 20 • 144
Magic 1-For-1: Generating One Minute Video Clips within One Minute Paper • 2502.07701 • Published Feb 11 • 36
Scaling Pre-training to One Hundred Billion Data for Vision Language Models Paper • 2502.07617 • Published Feb 11 • 29
FlashVideo:Flowing Fidelity to Detail for Efficient High-Resolution Video Generation Paper • 2502.05179 • Published Feb 7 • 24
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Paper • 2502.02737 • Published Feb 4 • 229
VideoJAM: Joint Appearance-Motion Representations for Enhanced Motion Generation in Video Models Paper • 2502.02492 • Published Feb 4 • 65
Textoon: Generating Vivid 2D Cartoon Characters from Text Descriptions Paper • 2501.10020 • Published Jan 17 • 23