File size: 1,968 Bytes
f0a490d 09e5125 dc05361 09e5125 dc05361 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
---
license: bsd
---
# InteractDiffusion Diffuser Implementation
[Project Page](https://jiuntian.github.io/interactdiffusion) |
[Paper](https://arxiv.org/abs/2312.05849) |
[WebUI](https://github.com/jiuntian/sd-webui-interactdiffusion) |
[Demo](https://huggingface.co/spaces/interactdiffusion/interactdiffusion) |
[Video](https://www.youtube.com/watch?v=Uunzufq8m6Y) |
[Diffuser](https://huggingface.co/interactdiffusion/diffusers-v1-2) |
[Colab](https://colab.research.google.com/drive/1Bh9PjfTylxI2rbME5mQJtFqNTGvaghJq?usp=sharing)
## How to Use
```python
from diffusers import DiffusionPipeline
import torch
pipeline = DiffusionPipeline.from_pretrained(
"interactdiffusion/diffusers-v1-2",
trust_remote_code=True,
variant="fp16", torch_dtype=torch.float16
)
pipeline = pipeline.to("cuda")
images = pipeline(
prompt="a person is feeding a cat",
interactdiffusion_subject_phrases=["person"],
interactdiffusion_object_phrases=["cat"],
interactdiffusion_action_phrases=["feeding"],
interactdiffusion_subject_boxes=[[0.0332, 0.1660, 0.3359, 0.7305]],
interactdiffusion_object_boxes=[[0.2891, 0.4766, 0.6680, 0.7930]],
interactdiffusion_scheduled_sampling_beta=1,
output_type="pil",
num_inference_steps=50,
).images
images[0].save('out.jpg')
```
For more information, please check the [project homepage](https://jiuntian.github.io/interactdiffusion/).
## Citation
```bibtex
@inproceedings{hoe2023interactdiffusion,
title={InteractDiffusion: Interaction Control in Text-to-Image Diffusion Models},
author={Jiun Tian Hoe and Xudong Jiang and Chee Seng Chan and Yap-Peng Tan and Weipeng Hu},
year={2024},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
}
```
## Acknowledgement
This work is developed based on the codebase of [GLIGEN](https://github.com/gligen/GLIGEN) and [LDM](https://github.com/CompVis/latent-diffusion).
|