Model Card

This model is obtained by fine-tuning Qwen2-VL-7B-Base on LLaVA-Video-178K. It is used as a comparison baseline in LiveCC project.

Performance

Acknowledgement

Joya Chen built the training code, and Yiqi Lin trained the model. The QA evaluation is done by Joya Chen, and CC evaluation is done by Ziyun Zeng. Infra is supported by the company.

Downloads last month: 36

Safetensors

Model size

8.29B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for chenjoya/Qwen2-VL-7B-LLaVAInstruct

Base model

Qwen/Qwen2-VL-7B

Finetuned

(15)

this model

Dataset used to train chenjoya/Qwen2-VL-7B-LLaVAInstruct

Collection including chenjoya/Qwen2-VL-7B-LLaVAInstruct

LiveCC

Collection

Learning Video LLM with Streaming Speech Transcription at Scale (CVPR 2025) • 8 items • Updated 14 days ago • 4