BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Paper • 2504.18415 • Published 13 days ago • 41
Latent Diffusion Autoencoders: Toward Efficient and Meaningful Unsupervised Representation Learning in Medical Imaging Paper • 2504.08635 • Published 27 days ago • 5
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration Paper • 2504.08591 • Published 27 days ago • 18
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published 27 days ago • 47
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability Paper • 2504.08003 • Published 29 days ago • 49
Efficient Generative Model Training via Embedded Representation Warmup Paper • 2504.10188 • Published 24 days ago • 12
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL Paper • 2504.11455 • Published 23 days ago • 13
DataDecide: How to Predict Best Pretraining Data with Small Experiments Paper • 2504.11393 • Published 23 days ago • 17
REPA-E: Unlocking VAE for End-to-End Tuning with Latent Diffusion Transformers Paper • 2504.10483 • Published 24 days ago • 21
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding Paper • 2504.13180 • Published 21 days ago • 17
Perception Encoder: The best visual embeddings are not at the output of the network Paper • 2504.13181 • Published 21 days ago • 34
Tokenize Image Patches: Global Context Fusion for Effective Haze Removal in Large Images Paper • 2504.09621 • Published 25 days ago • 12
Bridging Continuous and Discrete Tokens for Autoregressive Visual Generation Paper • 2503.16430 • Published Mar 20 • 35
When Less is Enough: Adaptive Token Reduction for Efficient Image Representation Paper • 2503.16660 • Published Mar 20 • 73