Video-LLaVA-Seg
This is the official baseline implementation for the ViCas dataset, presented in the paper ViCaS: A Dataset for Combining Holistic and Pixel-level Video Understanding using Captions with Grounded Segmentation.
For details about setting up the model, refer to the Video-LLaVA-Seg GitHub repo
For details about downloading and evaluating the dataset benchmark, refer to the ViCaS GitHub repo
- Downloads last month
- 7
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support