|
--- |
|
license: cc-by-4.0 |
|
library_name: transformers |
|
pipeline_tag: video-text-to-text |
|
--- |
|
|
|
A competitive and human-aligned detailed video captioner model based on [VILA-v1.5-13B](https://huggingface.co/Efficient-Large-Model/VILA1.5-13b) and described in [Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption](https://huggingface.co/papers/2503.09279). |
|
|
|
This model produces detailed captions for input video, as presented in [Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption](https://arxiv.org/abs/2503.09279). |
|
|
|
For more details, please refer to our project page: https://sais-fuxi.github.io/projects/cockatiel |
|
|
|
Code: https://github.com/Fr0zenCrane/Cockatiel |