|
--- |
|
license: cc-by-nc-4.0 |
|
library_name: diffusers |
|
tags: |
|
- text-to-video |
|
- diffusion distillation |
|
--- |
|
|
|
# CausVid Model Card |
|
|
|
|
|
 |
|
|
|
> [**From Slow Bidirectional to Fast Autoregressive Video Diffusion Models**](https://arxiv.org/abs/2412.07772), |
|
> Tianwei Yin*, Qiang Zhang*, Richard Zhang, William T. Freeman, Frédo Durand, Eli Shechtman, Xun Huang (* equal contribution) |
|
|
|
|
|
## Environment Setup |
|
|
|
```bash |
|
git clone https://github.com/tianweiy/CausVid && cd CausVid |
|
conda create -n causvid python=3.10 -y |
|
conda activate causvid |
|
pip install torch torchvision |
|
pip install -r requirements.txt |
|
python setup.py develop |
|
``` |
|
|
|
Also download the Wan base models from [here](https://github.com/Wan-Video/Wan2.1) and save it to wan_models/Wan2.1-T2V-1.3B/ |
|
|
|
## Inference Example |
|
|
|
First download the checkpoints: [Autoregressive Model](https://huggingface.co/tianweiy/CausVid/tree/main/autoregressive_checkpoint), [Bidirectional Model 1](https://huggingface.co/tianweiy/CausVid/tree/main/bidirectional_checkpoint1) or [Bidirectional Model 2](https://huggingface.co/tianweiy/CausVid/tree/main/bidirectional_checkpoint2) (performs slightly better). |
|
|
|
### Autoregressive 3-step 5-second Video Generation |
|
|
|
```bash |
|
python minimal_inference/autoregressive_inference.py --config_path configs/wan_causal_dmd.yaml --checkpoint_folder XXX --output_folder XXX --prompt_file_path XXX |
|
``` |
|
|
|
### Autoregressive 3-step long Video Generation |
|
|
|
```bash |
|
python minimal_inference/longvideo_autoregressive_inference.py --config_path configs/wan_causal_dmd.yaml --checkpoint_folder XXX --output_folder XXX --prompt_file_path XXX --num_rollout XXX |
|
``` |
|
|
|
### Bidirectional 3-step 5-second Video Generation |
|
|
|
```bash |
|
python minimal_inference/bidirectional_inference.py --config_path configs/wan_bidirectional_dmd_from_scratch.yaml --checkpoint_folder XXX --output_folder XXX --prompt_file_path XXX |
|
``` |
|
|
|
For more information, please refer to the [code repository](https://github.com/tianweiy/DMD2) |
|
|
|
|
|
## License |
|
|
|
CausVid is released under [Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License](https://creativecommons.org/licenses/by-nc-sa/4.0/deed.en). |
|
|
|
|
|
## Citation |
|
|
|
If you find CausVid useful or relevant to your research, please kindly cite our papers: |
|
|
|
```bib |
|
@inproceedings{yin2025causvid, |
|
title={From Slow Bidirectional to Fast Autoregressive Video Diffusion Models}, |
|
author={Yin, Tianwei and Zhang, Qiang and Zhang, Richard and Freeman, William T and Durand, Fredo and Shechtman, Eli and Huang, Xun}, |
|
booktitle={CVPR}, |
|
year={2025} |
|
} |
|
|
|
@inproceedings{yin2024improved, |
|
title={Improved Distribution Matching Distillation for Fast Image Synthesis}, |
|
author={Yin, Tianwei and Gharbi, Micha{\"e}l and Park, Taesung and Zhang, Richard and Shechtman, Eli and Durand, Fredo and Freeman, William T}, |
|
booktitle={NeurIPS}, |
|
year={2024} |
|
} |
|
|
|
@inproceedings{yin2024onestep, |
|
title={One-step Diffusion with Distribution Matching Distillation}, |
|
author={Yin, Tianwei and Gharbi, Micha{\"e}l and Zhang, Richard and Shechtman, Eli and Durand, Fr{\'e}do and Freeman, William T and Park, Taesung}, |
|
booktitle={CVPR}, |
|
year={2024} |
|
} |
|
``` |
|
|