Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation Paper • 2504.17207 • Published 14 days ago • 29
Token-Shuffle: Towards High-Resolution Image Generation with Autoregressive Models Paper • 2504.17789 • Published 14 days ago • 23
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published 14 days ago • 38
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction Paper • 2504.01014 • Published Apr 1 • 68
Audio-visual Controlled Video Diffusion with Masked Selective State Spaces Modeling for Natural Talking Head Generation Paper • 2504.02542 • Published Apr 3 • 44
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 89 • 7
Boosting Discriminative Visual Representation Learning with Scenario-Agnostic Mixup Paper • 2111.15454 • Published Nov 30, 2021
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 89
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 89 • 7
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 89 • 7
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 89 • 7
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization Paper • 2504.00999 • Published Apr 1 • 89