Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,96 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: mit
|
3 |
+
tags:
|
4 |
+
- image-generation
|
5 |
+
- HiDream.ai
|
6 |
+
language:
|
7 |
+
- en
|
8 |
+
pipeline_tag: text-to-image
|
9 |
+
library_name: diffusers
|
10 |
+
---
|
11 |
+
|
12 |
+
`HiDream-I1` is a series of state-of-the-art open-source image generation models featuring a 16 billion parameter rectified flow transformer with Mixture of Experts architecture, designed to create high-quality images from text prompts.
|
13 |
+
|
14 |
+
## Key Features
|
15 |
+
|
16 |
+
- ✨ **Superior Image Quality** - Produces exceptional results across multiple styles including photorealistic, cartoon, artistic, and more
|
17 |
+
- 🎯 **Best-in-Class Prompt Following** - Achieves industry-leading scores on GenEval and DPG benchmarks, outperforming all other open source models
|
18 |
+
- 🔓 **Open Source** - Released under MIT license to foster scientific advancement and enable creative innovation
|
19 |
+
- 💼 **Commercial-Friendly** - Generated images can be freely used for personal projects, scientific research, and commercial applications
|
20 |
+
|
21 |
+
## Quick Start
|
22 |
+
Please make sure you have installed [Flash Attention](https://github.com/Dao-AILab/flash-attention). We recommend CUDA versions 12.4 for the manual installation.
|
23 |
+
```
|
24 |
+
pip install -r requirements.txt
|
25 |
+
```
|
26 |
+
Clone the GitHub repo:
|
27 |
+
```
|
28 |
+
git clone https://github.com/HiDream-ai/HiDream-I1
|
29 |
+
```
|
30 |
+
|
31 |
+
Then you can run the inference scripts to generate images:
|
32 |
+
|
33 |
+
``` python
|
34 |
+
|
35 |
+
# For full model inference
|
36 |
+
python ./inference.py
|
37 |
+
# For distilled dev model inference
|
38 |
+
INFERENCE_STEP=28 PRETRAINED_MODEL_NAME_OR_PATH=XXX python inference_distilled.py
|
39 |
+
|
40 |
+
# For distilled fast model inference
|
41 |
+
INFERENCE_STEP=16 PRETRAINED_MODEL_NAME_OR_PATH=XXX python inference_distilled.py
|
42 |
+
|
43 |
+
```
|
44 |
+
> **Note:** The inference script will automatically download `meta-llama/Meta-Llama-3.1-8B-Instruct` model files. If you encounter network issues, you can download these files ahead of time and place them in the appropriate cache directory to avoid download failures during inference.
|
45 |
+
|
46 |
+
|
47 |
+
## Evaluation Metrics
|
48 |
+
|
49 |
+
### DPG-Bench
|
50 |
+
| Model | Overall | Global | Entity | Attribute | Relation | Other |
|
51 |
+
|-----------------|-----------|-----------|-----------|-----------|-----------|-----------|
|
52 |
+
| PixArt-alpha | 71.11 | 74.97 | 79.32 | 78.60 | 82.57 | 76.96 |
|
53 |
+
| SDXL | 74.65 | 83.27 | 82.43 | 80.91 | 86.76 | 80.41 |
|
54 |
+
| DALL-E 3 | 83.50 | 90.97 | 89.61 | 88.39 | 90.58 | 89.83 |
|
55 |
+
| Flux.1-dev | 83.79 | 85.80 | 86.79 | 89.98 | 90.04 | 89.90 |
|
56 |
+
| SD3-Medium | 84.08 | 87.90 | 91.01 | 88.83 | 80.70 | 88.68 |
|
57 |
+
| Janus-Pro-7B | 84.19 | 86.90 | 88.90 | 89.40 | 89.32 | 89.48 |
|
58 |
+
| CogView4-6B | 85.13 | 83.85 | 90.35 | 91.17 | 91.14 | 87.29 |
|
59 |
+
| **HiDream-I1** | **85.89**| 76.44 | 90.22 | 89.48 | 93.74 | 91.83 |
|
60 |
+
|
61 |
+
### GenEval
|
62 |
+
|
63 |
+
| Model | Overall | Single Obj. | Two Obj. | Counting | Colors | Position | Color attribution |
|
64 |
+
|-----------------|----------|-------------|----------|----------|----------|----------|-------------------|
|
65 |
+
| SDXL | 0.55 | 0.98 | 0.74 | 0.39 | 0.85 | 0.15 | 0.23 |
|
66 |
+
| PixArt-alpha | 0.48 | 0.98 | 0.50 | 0.44 | 0.80 | 0.08 | 0.07 |
|
67 |
+
| Flux.1-dev | 0.66 | 0.98 | 0.79 | 0.73 | 0.77 | 0.22 | 0.45 |
|
68 |
+
| DALL-E 3 | 0.67 | 0.96 | 0.87 | 0.47 | 0.83 | 0.43 | 0.45 |
|
69 |
+
| CogView4-6B | 0.73 | 0.99 | 0.86 | 0.66 | 0.79 | 0.48 | 0.58 |
|
70 |
+
| SD3-Medium | 0.74 | 0.99 | 0.94 | 0.72 | 0.89 | 0.33 | 0.60 |
|
71 |
+
| Janus-Pro-7B | 0.80 | 0.99 | 0.89 | 0.59 | 0.90 | 0.79 | 0.66 |
|
72 |
+
| **HiDream-I1** | **0.83**| 1.00 | 0.98 | 0.79 | 0.91 | 0.60 | 0.72 |
|
73 |
+
|
74 |
+
### HPSv2.1 benchmark
|
75 |
+
|
76 |
+
| Model | Averaged | Animation | Concept-art | Painting | Photo |
|
77 |
+
|-------------------------|----------------|------------|---------------|--------------|------------|
|
78 |
+
| Stable Diffusion v2.0 | 26.38 | 27.09 | 26.02 | 25.68 | 26.73 |
|
79 |
+
| Midjourney V6 | 30.29 | 32.02 | 30.29 | 29.74 | 29.10 |
|
80 |
+
| SDXL | 30.64 | 32.84 | 31.36 | 30.86 | 27.48 |
|
81 |
+
| Dall-E3 | 31.44 | 32.39 | 31.09 | 31.18 | 31.09 |
|
82 |
+
| SD3 | 31.53 | 32.60 | 31.82 | 32.06 | 29.62 |
|
83 |
+
| Midjourney V5 | 32.33 | 34.05 | 32.47 | 32.24 | 30.56 |
|
84 |
+
| CogView4-6B | 32.31 | 33.23 | 32.60 | 32.89 | 30.52 |
|
85 |
+
| Flux.1-dev | 32.47 | 33.87 | 32.27 | 32.62 | 31.11 |
|
86 |
+
| stable cascade | 32.95 | 34.58 | 33.13 | 33.29 | 30.78 |
|
87 |
+
| **HiDream-I1** | **33.82** | 35.05 | 33.74 | 33.88 | 32.61 |
|
88 |
+
|
89 |
+
|
90 |
+
## License Agreement
|
91 |
+
The Transformer models in this repository are licensed under the MIT License. The VAE is from `FLUX.1 [dev]`, and text encoders from `google/t5-v1_1-xxl` and `meta-llama/Meta-Llama-3.1-8B-Instruct`. Please follow the license terms specified for these components. You own all content you create with this model. You can use your generated content freely, but must comply with this license agreement. You are responsible for how you use the models. Do not create illegal content, harmful material, personal information that could harm others, false information, or content targeting vulnerable groups.
|
92 |
+
|
93 |
+
|
94 |
+
## Acknowledgements
|
95 |
+
- The VAE component is from `FLUX.1 [dev]` Model, licensed under the `FLUX.1 [dev]` Non-Commercial License by Black Forest Labs, Inc.
|
96 |
+
- The text encoders are from `google/t5-v1_1-xxl` (licensed under Apache 2.0) and `meta-llama/Meta-Llama-3.1-8B-Instruct` (licensed under the Llama 3.1 Community License Agreement).
|