ryanafufu
's Collections
my_read_book
updated
MambaVision: A Hybrid Mamba-Transformer Vision Backbone
Paper
•
2407.08083
•
Published
•
33
Transfusion: Predict the Next Token and Diffuse Images with One
Multi-Modal Model
Paper
•
2408.11039
•
Published
•
62
The Mamba in the Llama: Distilling and Accelerating Hybrid Models
Paper
•
2408.15237
•
Published
•
42
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Paper
•
2409.11355
•
Published
•
31
OmniGen: Unified Image Generation
Paper
•
2409.11340
•
Published
•
115
To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic
reasoning
Paper
•
2409.12183
•
Published
•
39
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced
Mathematical Reasoning
Paper
•
2409.12568
•
Published
•
51
Imagine yourself: Tuning-Free Personalized Image Generation
Paper
•
2409.13346
•
Published
•
71
Training Language Models to Self-Correct via Reinforcement Learning
Paper
•
2409.12917
•
Published
•
139
MaskBit: Embedding-free Image Generation via Bit Tokens
Paper
•
2409.16211
•
Published
•
17
Emu3: Next-Token Prediction is All You Need
Paper
•
2409.18869
•
Published
•
95
FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free
Scale Fusion
Paper
•
2412.09626
•
Published
•
20
Byte Latent Transformer: Patches Scale Better Than Tokens
Paper
•
2412.09871
•
Published
•
102
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Paper
•
2412.11815
•
Published
•
26
Mulberry: Empowering MLLM with o1-like Reasoning and Reflection via
Collective Monte Carlo Tree Search
Paper
•
2412.18319
•
Published
•
40
LlamaV-o1: Rethinking Step-by-step Visual Reasoning in LLMs
Paper
•
2501.06186
•
Published
•
66
Transformer^2: Self-adaptive LLMs
Paper
•
2501.06252
•
Published
•
55
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper
•
2501.08313
•
Published
•
289
Padding Tone: A Mechanistic Analysis of Padding Tokens in T2I Models
Paper
•
2501.06751
•
Published
•
33
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via
Reinforcement Learning
Paper
•
2501.12948
•
Published
•
391
Hunyuan3D 2.0: Scaling Diffusion Models for High Resolution Textured 3D
Assets Generation
Paper
•
2501.12202
•
Published
•
46
ChunkKV: Semantic-Preserving KV Cache Compression for Efficient
Long-Context LLM Inference
Paper
•
2502.00299
•
Published
•
2
Region-Adaptive Sampling for Diffusion Transformers
Paper
•
2502.10389
•
Published
•
54
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent
Image Generation
Paper
•
2502.18364
•
Published
•
36
Transformers without Normalization
Paper
•
2503.10622
•
Published
•
162
CFG-Zero*: Improved Classifier-Free Guidance for Flow Matching Models
Paper
•
2503.18886
•
Published
•
21
D^2iT: Dynamic Diffusion Transformer for Accurate Image Generation
Paper
•
2504.09454
•
Published
•
12
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Paper
•
2503.10772
•
Published
•
19
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion
Transformers via In-Context Reflection
Paper
•
2503.12271
•
Published
•
9
From Reflection to Perfection: Scaling Inference-Time Optimization for
Text-to-Image Diffusion Models via Reflection Tuning
Paper
•
2504.16080
•
Published
•
15
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture
Design in Text to Image Generation
Paper
•
2503.10618
•
Published
•
17
Softpick: No Attention Sink, No Massive Activations with Rectified
Softmax
Paper
•
2504.20966
•
Published
•
25