Spaces:
Running
Running
File size: 5,265 Bytes
91fb4ef 7595521 b722d84 7595521 24db093 7595521 24db093 7595521 6cf7909 222f539 6cf7909 7595521 e0e67e9 24db093 7595521 b722d84 7595521 4af8a5a 7595521 4af8a5a 7595521 6cf7909 7595521 e0e67e9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
---
title: Video Model Studio
emoji: π₯
colorFrom: gray
colorTo: gray
sdk: gradio
sdk_version: 5.15.0
app_file: app.py
pinned: true
license: apache-2.0
short_description: All-in-one tool for AI video training
---
# π₯ Video Model Studio (VMS)
## Presentation
### What is this project?
VMS is a Gradio app that wraps around Finetrainers, to provide a simple UI to train AI video models on Hugging Face.
You can deploy it to a private space, and start long-running training jobs in the background.
## Funding
VideoModelStudio is 100% open-source project, I develop and maintain it during both my pro and personal time. If you like it, you can tip! If not, have a good day π«Ά
<a href="https://www.buymeacoffee.com/flngr" target="_blank"><img src="https://www.buymeacoffee.com/assets/img/custom_images/orange_img.png" alt="Buy Me A Coffee" style="height: 41px !important;width: 174px !important;box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;-webkit-box-shadow: 0px 3px 2px 0px rgba(190, 190, 190, 0.5) !important;" ></a>
## News
- π₯ **2025-03-02**: Made some fixes to improve Finetrainer reliability when working with big datasets
- π₯ **2025-02-18**: I am working to add better recovery in case of a failed run (this is still in beta)
- π₯ **2025-02-18**: I have added persistence of UI settings. So if you reload Gradio, you won't lose your settings!
## TODO
- Add `Aya-Vision-8B` for frame analysis (currently we use `Qwen2-VL-7B`)
### See also
#### Internally used project: Finetrainers
VMS uses Finetrainers under the hood: https://github.com/a-r-r-o-w/finetrainers
#### Similar project: diffusion-pipe-ui
I wasn't aware of its existence when I started my project, but there is also this open-source initiative (which is similar in terms of dataset management etc): https://github.com/alisson-anjos/diffusion-pipe-ui
## Features
### Run Finetrainers in the background
The main feature of VMS is the ability to run a Finetrainers training session in the background.
You can start your job, close the web browser tab, and come back the next morning to see the result.
### Automatic scene splitting
VMS uses PySceneDetect to split scenes.
### Automatic clip captioning
VMS uses `LLaVA-Video-7B-Qwen2` for captioning. You can customize the system prompt if you want to.
### Download your dataset
Not interested in using VMS for training? That's perfectly fine!
You can use VMS for video splitting and captioning, and export the data for training on another platform eg. on Replicate or Fal.
## Supported models
VMS uses `Finetrainers` under the hood. In theory any model supported by Finetrainers should work in VMS.
In practice, a PR (pull request) will be necessary to adapt the UI a bit to accomodate for each model specificities.
### LTX-Video
I have tested training a LTX-Video LoRA model using videos (not images), on a single A100 instance.
It requires about 18/19 Gb of VRAM, depending on your settings.
### HunyuanVideo
I have tested training a HunyuanVideo LoRA model using videos (not images),, on a single A100 instance.
It requires about 47~49 Gb of VRAM, depending on your settings.
### CogVideoX
Do you want support for this one? Let me know in the comments!
## Limitations
### One-user-per-space design
Currently CMS can only support one training job at a time, anybody with access to your Gradio app will be able to upload or delete everything etc.
This means you have to run VMS in a *PRIVATE* HF Space, or locally if you require full privacy.
## Deployment
VMS is built on top of Finetrainers and Gradio, and designed to run as a Hugging Face Space (but you can deploy it anywhere that has a NVIDIA GPU and supports Docker).
### Full installation at Hugging Face
Easy peasy: create a Space (make sure to use the `Gradio` type/template), and push the repo. No Docker needed!
That said, please see the "RUN" section for info about environement variables.
### Dev mode on Hugging Face
Enable dev mode in the space, then open VSCode in local or remote and run:
```
pip install -r requirements.txt
```
As this is not automatic, then click on "Restart" in the space dev mode UI widget.
### Full installation somewhere else
I haven't tested it, but you can try to provided Dockerfile
### Full installation in local
the full installation requires:
- Linux
- CUDA 12
- Python 3.10
This is because of flash attention, which is defined in the `requirements.txt` using an URL to download a prebuilt wheel (python bindings for a native library)
```bash
./setup.sh
```
### Degraded installation in local
If you cannot meet the requirements, you can:
- solution 1: fix requirements.txt to use another prebuilt wheel
- solution 2: manually build/install flash attention
- solution 3: don't use clip captioning
Here is how to do solution 3:
```bash
./setup_no_captions.sh
```
## Run
### Running the Gradio app
Note: please make sure you properly define the environment variables for `STORAGE_PATH` (eg. `/data/`) and `HF_HOME` (eg. `/data/huggingface/`)
```bash
python app.py
```
### Running locally
See above remarks about the environment variable.
By default `run.sh` will store stuff in `.data/` (located inside the current working directory):
```bash
./run.sh
```
|