Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Posts
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Zmu 's Collections
Video Understanding
LLM
Multimodal
Encoders

Video Understanding

updated Jan 13, 2024
Upvote
1

  • MM-VID: Advancing Video Understanding with GPT-4V(ision)

    Paper • 2310.19773 • Published Oct 30, 2023 • 20

  • Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models

    Paper • 2310.05863 • Published Oct 9, 2023 • 1

  • Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

    Paper • 2311.06242 • Published Nov 10, 2023 • 93

  • I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization

    Paper • 2311.10126 • Published Nov 16, 2023 • 10

  • Video-LLaVA: Learning United Visual Representation by Alignment Before Projection

    Paper • 2311.10122 • Published Nov 16, 2023 • 27

  • Retrieval-Enhanced Contrastive Vision-Text Models

    Paper • 2306.07196 • Published Jun 12, 2023 • 7

  • Text-Conditioned Resampler For Long Form Video Understanding

    Paper • 2312.11897 • Published Dec 19, 2023 • 6

  • Vamos: Versatile Action Models for Video Understanding

    Paper • 2311.13627 • Published Nov 22, 2023 • 2
Upvote
1
  • Collection guide
  • Browse collections
Company
TOS Privacy About Jobs
Website
Models Datasets Spaces Pricing Docs