SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing Paper • 2505.02370 • Published 3 days ago • 11
D-FINE: Redefine Regression Task in DETRs as Fine-grained Distribution Refinement Paper • 2410.13842 • Published Oct 17, 2024 • 3
Improving Editability in Image Generation with Layer-wise Memory Paper • 2505.01079 • Published 6 days ago • 25
PixelHacker: Image Inpainting with Structural and Semantic Consistency Paper • 2504.20438 • Published 9 days ago • 38
BitNet v2: Native 4-bit Activations with Hadamard Transformation for 1-bit LLMs Paper • 2504.18415 • Published 13 days ago • 41
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs Paper • 2504.17432 • Published 14 days ago • 38
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation Paper • 2504.17207 • Published 14 days ago • 29
Step1X-Edit: A Practical Framework for General Image Editing Paper • 2504.17761 • Published 14 days ago • 86
Packing Input Frame Context in Next-Frame Prediction Models for Video Generation Paper • 2504.12626 • Published 21 days ago • 48
VistaDPO: Video Hierarchical Spatial-Temporal Direct Preference Optimization for Large Video Models Paper • 2504.13122 • Published 21 days ago • 21
FlexIP: Dynamic Control of Preservation and Personality for Customized Image Generation Paper • 2504.07405 • Published 28 days ago • 12
ZipIR: Latent Pyramid Diffusion Transformer for High-Resolution Image Restoration Paper • 2504.08591 • Published 27 days ago • 18
MineWorld: a Real-Time and Open-Source Interactive World Model on Minecraft Paper • 2504.08388 • Published 27 days ago • 39
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation Paper • 2504.08736 • Published 27 days ago • 47
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model Paper • 2504.08685 • Published 27 days ago • 123
TinyLLaVA-Video-R1: Towards Smaller LMMs for Video Reasoning Paper • 2504.09641 • Published 25 days ago • 16
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models Paper • 2504.10479 • Published 24 days ago • 255