Video Understanding - a Zmu Collection

Zmu 's Collections

Video Understanding

LLM

Video Understanding

updated Jan 13, 2024

MM-VID: Advancing Video Understanding with GPT-4V(ision)

Paper • 2310.19773 • Published Oct 30, 2023 • 20
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

Paper • 2310.05863 • Published Oct 9, 2023 • 1
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Paper • 2311.06242 • Published Nov 10, 2023 • 93
I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

Paper • 2311.10126 • Published Nov 16, 2023 • 10
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

Paper • 2311.10122 • Published Nov 16, 2023 • 27
Retrieval-Enhanced Contrastive Vision-Text Models

Paper • 2306.07196 • Published Jun 12, 2023 • 7
Text-Conditioned Resampler For Long Form Video Understanding

Paper • 2312.11897 • Published Dec 19, 2023 • 6
Vamos: Versatile Action Models for Video Understanding

Paper • 2311.13627 • Published Nov 22, 2023 • 2