RIG: Synergizing Reasoning and Imagination in End-to-End Generalist Policy Paper ⢠2503.24388 ⢠Published Mar 31 ⢠30
TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization Paper ⢠2503.19901 ⢠Published Mar 25 ⢠41
Efficient Inference for Large Reasoning Models: A Survey Paper ⢠2503.23077 ⢠Published Mar 29 ⢠46
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper ⢠2503.24290 ⢠Published Mar 31 ⢠62
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper ⢠2503.23461 ⢠Published Mar 30 ⢠95
MoCha: Towards Movie-Grade Talking Character Synthesis Paper ⢠2503.23307 ⢠Published Mar 30 ⢠133
Landscape of Thoughts: Visualizing the Reasoning Process of Large Language Models Paper ⢠2503.22165 ⢠Published Mar 28 ⢠28
Open-Qwen2VL: Compute-Efficient Pre-Training of Fully-Open Multimodal LLMs on Academic Resources Paper ⢠2504.00595 ⢠Published Apr 1 ⢠36
Any2Caption:Interpreting Any Condition to Caption for Controllable Video Generation Paper ⢠2503.24379 ⢠Published Mar 31 ⢠76
Exploring the Effect of Reinforcement Learning on Video Understanding: Insights from SEED-Bench-R1 Paper ⢠2503.24376 ⢠Published Mar 31 ⢠38
MegaTTS 3: Sparse Alignment Enhanced Latent Diffusion Transformer for Zero-Shot Speech Synthesis Paper ⢠2502.18924 ⢠Published Feb 26 ⢠12
PaperBench: Evaluating AI's Ability to Replicate AI Research Paper ⢠2504.01848 ⢠Published Apr 2 ⢠36
VideoScene: Distilling Video Diffusion Model to Generate 3D Scenes in One Step Paper ⢠2504.01956 ⢠Published Apr 2 ⢠40
Towards Physically Plausible Video Generation via VLM Planning Paper ⢠2503.23368 ⢠Published Mar 30 ⢠40
ScholarCopilot: Training Large Language Models for Academic Writing with Accurate Citations Paper ⢠2504.00824 ⢠Published Apr 1 ⢠41