WebThinker: Empowering Large Reasoning Models with Deep Research Capability Paper • 2504.21776 • Published 8 days ago • 41
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 14 days ago • 86
One-Minute Video Generation with Test-Time Training Paper • 2504.05298 • Published about 1 month ago • 102
SmolVLM: Redefining small and efficient multimodal models Paper • 2504.05299 • Published about 1 month ago • 180
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems Paper • 2504.01990 • Published Mar 31 • 275
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published 17 days ago • 65
Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? Paper • 2504.13837 • Published 20 days ago • 119
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning Paper • 2504.13820 • Published 20 days ago • 17
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation Paper • 2504.14538 • Published 18 days ago • 27
Describe Anything: Detailed Localized Image and Video Captioning Paper • 2504.16072 • Published 16 days ago • 60
The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks Paper • 2504.15521 • Published 16 days ago • 63
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published 27 days ago • 39