Uni3C: Unifying Precisely 3D-Enhanced Camera and Human Motion Controls for Video Generation Paper β’ 2504.14899 β’ Published 16 days ago β’ 20
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives Paper β’ 2504.10823 β’ Published 23 days ago β’ 14
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes Paper β’ 2503.23461 β’ Published Mar 30 β’ 95
Running 293 293 Qwen2.5 Omni 7B Demo π Generate text and speech responses from text, images, or audio input
Running 7 7 Deep Reinforcement Learning Leaderboard π Display and search trained RL models on a leaderboard
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models Paper β’ 2503.09573 β’ Published Mar 12 β’ 71
Multimodal Representation Alignment for Image Generation: Text-Image Interleaved Control Is Easier Than You Think Paper β’ 2502.20172 β’ Published Feb 27 β’ 28
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute Paper β’ 2502.20126 β’ Published Feb 27 β’ 20
UniTok: A Unified Tokenizer for Visual Generation and Understanding Paper β’ 2502.20321 β’ Published Feb 27 β’ 30