VITA-MLLM
/

VITA-1.5

Video-Text-to-Text

Model card Files Files and versions Community

This repository contains the model of the paper VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction.

Code: https://github.com/VITA-MLLM/VITA

Downloads last month: 376

Safetensors

Model size

8.32B params

Tensor type

BF16

·

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using VITA-MLLM/VITA-1.5 1