RefVNLI: Towards Scalable Evaluation of Subject-driven Text-to-image Generation Paper • 2504.17502 • Published 14 days ago • 54
Scaling Analysis of Interleaved Speech-Text Language Models Paper • 2504.02398 • Published Apr 3 • 28
Follow the Flow: On Information Flow Across Textual Tokens in Text-to-Image Models Paper • 2504.01137 • Published Apr 1 • 21
OmnimatteZero: Training-free Real-time Omnimatte with Pre-trained Video Diffusion Models Paper • 2503.18033 • Published Mar 23 • 25
RewardSDS: Aligning Score Distillation via Reward-Weighted Sampling Paper • 2503.09601 • Published Mar 12 • 15
Slam Collection All resources for SpeechLMs from "Slamming: Training a Speech Language Model on One GPU in a Day". We provide tokeniser, lm, and datasets • 6 items • Updated Feb 25 • 13
Slamming: Training a Speech Language Model on One GPU in a Day Paper • 2502.15814 • Published Feb 19 • 70
Can this Model Also Recognize Dogs? Zero-Shot Model Search from Weights Paper • 2502.09619 • Published Feb 13 • 35
Through-The-Mask: Mask-based Motion Trajectories for Image-to-Video Generation Paper • 2501.03059 • Published Jan 6 • 22
ObjectMate: A Recurrence Prior for Object Insertion and Subject-Driven Generation Paper • 2412.08645 • Published Dec 11, 2024 • 11
Hidden in the Noise: Two-Stage Robust Watermarking for Images Paper • 2412.04653 • Published Dec 5, 2024 • 31
Is It Really Long Context if All You Need Is Retrieval? Towards Genuinely Difficult Long Context NLP Paper • 2407.00402 • Published Jun 29, 2024 • 23
Recovering the Pre-Fine-Tuning Weights of Generative Models Paper • 2402.10208 • Published Feb 15, 2024 • 7