File size: 2,420 Bytes
c80a977
 
 
 
 
48f2c69
 
 
 
 
 
 
 
c80a977
 
 
 
 
48f2c69
c80a977
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: cc-by-nc-nd-4.0
---

# RealCustom Series

<div style="display:flex;justify-content: center">
<a href="https://corleone-huang.github.io/RealCustom_plus_plus/"><img alt="Build" src="https://img.shields.io/badge/Project%20Page-RealCustom-yellow"></a> 
<a href="https://arxiv.org/pdf/2408.09744"><img alt="Build" src="https://img.shields.io/badge/arXiv%20paper-2504.02160-b31b1b.svg"></a>
<a href="https://github.com/bytedance/RealCustom"><img src="https://img.shields.io/static/v1?label=GitHub&message=Code&color=green&logo=github"></a>
</div>

![teaser of RealCustom](./assets/teaser.svg)

## 📖 Introduction

Existing text-to-image customization methods (i.e., subject-driven generation) face a fundamental challenge due to the entangled influence of visual and textual conditions. This inherent conflict forces a trade-off between subject fidelity and textual controllability, preventing simultaneous optimization of both objectives.We present RealCustom to disentangle subject similarity from text controllability and thereby allows both to be optimized simultaneously without conflicts. The core idea of RealCustom is to represent given subjects as real words that can be seamlessly integrated with given texts, and further leveraging the relevance between real words and image regions to disentangle visual condition from text condition.

![process of RealCustom](./assets/process.svg)

## ⚡️ Quick Start

### 🔧 Requirements and Installation

Install the requirements
```bash
bash envs/init.sh
```

### ✍️ Inference
```bash 
bash inference/inference_single_image.sh
```

### 🌟 Gradio Demo
```
python inference/app.py
```

##  Citation
If you find this project useful for your research, please consider citing our papers:
```bibtex
@inproceedings{huang2024realcustom,
  title={RealCustom: narrowing real text word for real-time open-domain text-to-image customization},
  author={Huang, Mengqi and Mao, Zhendong and Liu, Mingcong and He, Qian and Zhang, Yongdong},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7476--7485},
  year={2024}
}
@article{mao2024realcustom++,
  title={Realcustom++: Representing images as real-word for real-time customization},
  author={Mao, Zhendong and Huang, Mengqi and Ding, Fei and Liu, Mingcong and He, Qian and Zhang, Yongdong},
  journal={arXiv preprint arXiv:2408.09744},
  year={2024}
}
```