Taylor658
's Collections
Computer Vision
updated
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse
Viewpoints
Paper
•
2412.07760
•
Published
•
56
MoViE: Mobile Diffusion for Video Editing
Paper
•
2412.06578
•
Published
•
18
Video Motion Transfer with Diffusion Transformers
Paper
•
2412.07776
•
Published
•
17
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper
•
2412.04814
•
Published
•
49
VisionZip: Longer is Better but Not Necessary in Vision Language Models
Paper
•
2412.04467
•
Published
•
111
VideoGen-of-Thought: A Collaborative Framework for Multi-Shot Video
Generation
Paper
•
2412.02259
•
Published
•
60
STIV: Scalable Text and Image Conditioned Video Generation
Paper
•
2412.07730
•
Published
•
75
Towards Language Models That Can See: Computer Vision Through the LENS
of Natural Language
Paper
•
2306.16410
•
Published
•
28
SynerGen-VL: Towards Synergistic Image Understanding and Generation with
Vision Experts and Token Folding
Paper
•
2412.09604
•
Published
•
38
GenEx: Generating an Explorable World
Paper
•
2412.09624
•
Published
•
97
Apollo: An Exploration of Video Understanding in Large Multimodal Models
Paper
•
2412.10360
•
Published
•
146
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper
•
2402.17764
•
Published
•
615
Video-R1: Reinforcing Video Reasoning in MLLMs
Paper
•
2503.21776
•
Published
•
78
Scaling Vision Pre-Training to 4K Resolution
Paper
•
2503.19903
•
Published
•
41