T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT Paper β’ 2505.00703 β’ Published 7 days ago β’ 39
TeLoGraF: Temporal Logic Planning via Graph-encoded Flow Matching Paper β’ 2505.00562 β’ Published 7 days ago β’ 3
Improving Editability in Image Generation with Layer-wise Memory Paper β’ 2505.01079 β’ Published 6 days ago β’ 25
PixelHacker: Image Inpainting with Structural and Semantic Consistency Paper β’ 2504.20438 β’ Published 9 days ago β’ 38
COMPACT: COMPositional Atomic-to-Complex Visual Capability Tuning Paper β’ 2504.21850 β’ Published 8 days ago β’ 24
UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities Paper β’ 2504.20734 β’ Published 9 days ago β’ 60
YoChameleon: Personalized Vision and Language Generation Paper β’ 2504.20998 β’ Published 9 days ago β’ 11
Distilling semantically aware orders for autoregressive image generation Paper β’ 2504.17069 β’ Published 15 days ago β’ 5
ViSMaP: Unsupervised Hour-long Video Summarisation by Meta-Prompting Paper β’ 2504.15921 β’ Published 16 days ago β’ 7
3DV-TON: Textured 3D-Guided Consistent Video Try-on via Diffusion Models Paper β’ 2504.17414 β’ Published 14 days ago β’ 15
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos Paper β’ 2504.17343 β’ Published 14 days ago β’ 11
Boosting Generative Image Modeling via Joint Image-Feature Synthesis Paper β’ 2504.16064 β’ Published 16 days ago β’ 14
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs Paper β’ 2504.17040 β’ Published 15 days ago β’ 13
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Paper β’ 2504.17789 β’ Published 14 days ago β’ 23
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining Paper β’ 2504.16511 β’ Published 15 days ago β’ 20
Step1X-Edit: A Practical Framework for General Image Editing Paper β’ 2504.17761 β’ Published 14 days ago β’ 86