Add pipeline tag, library name and license (#1)
Browse files- Add pipeline tag, library name and license (870fee5056121be118c8ee68684242343ceb3249)
Co-authored-by: Niels Rogge <[email protected]>
README.md
CHANGED
@@ -1,3 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# VIRES model card
|
2 |
|
3 |
**Model Page**: [VIRES](https://hjzheng.net/projects/VIRES/)
|
@@ -8,21 +14,42 @@ Summary description and brief definition of inputs and outputs.
|
|
8 |
|
9 |
### Description
|
10 |
|
11 |
-
VIRES is a video instance repainting method with sketch and text guidance, enabling video instance repainting, replacement, generation, and removal.
|
|
|
12 |
|
13 |
### Inputs and outputs
|
14 |
|
15 |
- **Input:**
|
16 |
-
- Text string
|
17 |
-
- Mask Sequence
|
18 |
-
- Sketch Sequence
|
19 |
|
20 |
- **Output:**
|
21 |
-
- A
|
22 |
|
23 |
### Usage
|
24 |
|
25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
|
27 |
## Citation
|
28 |
|
@@ -33,5 +60,4 @@ Ref to our GitHub page: https://github.com/suimuc/VIRES/
|
|
33 |
journal={arXiv preprint arXiv:2411.16199},
|
34 |
year={2024}
|
35 |
}
|
36 |
-
```
|
37 |
-
|
|
|
1 |
+
---
|
2 |
+
pipeline_tag: image-to-video
|
3 |
+
library_name: diffusers
|
4 |
+
license: mit
|
5 |
+
---
|
6 |
+
|
7 |
# VIRES model card
|
8 |
|
9 |
**Model Page**: [VIRES](https://hjzheng.net/projects/VIRES/)
|
|
|
14 |
|
15 |
### Description
|
16 |
|
17 |
+
VIRES is a video instance repainting method with sketch and text guidance, enabling video instance repainting, replacement, generation, and removal. It leverages the generative priors of text-to-video models to maintain temporal consistency and produce visually pleasing results. Key features include a Sequential ControlNet for structure layout extraction and detail capture, sketch attention for injecting fine-grained semantics, and a sketch-aware encoder for alignment.
|
18 |
+
|
19 |
|
20 |
### Inputs and outputs
|
21 |
|
22 |
- **Input:**
|
23 |
+
- Text string describing the desired changes.
|
24 |
+
- Mask Sequence (51 x 512 x 512 resolution).
|
25 |
+
- Sketch Sequence (51 x 512 x 512 resolution).
|
26 |
|
27 |
- **Output:**
|
28 |
+
- A repainted video.
|
29 |
|
30 |
### Usage
|
31 |
|
32 |
+
A basic example using the `diffusers` library (requires appropriate model weights and dependencies):
|
33 |
+
|
34 |
+
```python
|
35 |
+
from diffusers import DiffusionPipeline #Import necessary libraries
|
36 |
+
# Load the model (replace with your actual paths)
|
37 |
+
pipe = DiffusionPipeline.from_pretrained("suimu/VIRES", torch_dtype=torch.float16).to("cuda")
|
38 |
+
|
39 |
+
# Prepare inputs: text prompt, mask, and sketch
|
40 |
+
prompt = "A cat replaces the dog in this video"
|
41 |
+
mask = ... #Load your mask sequence
|
42 |
+
sketch = ... #Load your sketch sequence
|
43 |
+
|
44 |
+
# Generate the video
|
45 |
+
video = pipe(prompt, mask, sketch).videos[0]
|
46 |
+
|
47 |
+
# Save or display the video
|
48 |
+
...
|
49 |
+
```
|
50 |
+
|
51 |
+
For complete usage instructions and advanced options, refer to our GitHub page: https://github.com/suimuc/VIRES/
|
52 |
+
|
53 |
|
54 |
## Citation
|
55 |
|
|
|
60 |
journal={arXiv preprint arXiv:2411.16199},
|
61 |
year={2024}
|
62 |
}
|
63 |
+
```
|
|