File size: 9,237 Bytes
c19ca42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
# Control

Native control module for SD.Next for Diffusers backend  
Can be used for Control generation as well as Image and Text workflows  

For a guide on the options and settings, as well as explanations for the controls themselves, see the [Control Guide](https://github.com/vladmandic/automatic/wiki/Control-Guide) page.

## Supported Control Models  

- [lllyasviel ControlNet](https://github.com/lllyasviel/ControlNet) for **SD 1.5** and **SD-XL** models  
  Includes ControlNets as well as Reference-only mode and any compatible 3rd party models  
  Original ControlNets for SD15 are 1.4GB each and for SDXL its at massive 4.9GB  
- [VisLearn ControlNet XS](https://vislearn.github.io/ControlNet-XS/) for **SD-XL** models  
  Lightweight ControlNet models for SDXL at 165MB only with near-identical results  
- [TencentARC T2I-Adapter](https://github.com/TencentARC/T2I-Adapter) for **SD 1.5** and **SD-XL** models  
  T2I-Adapters provide similar functionality at much lower resource cost at only 300MB each  
- [Kohya Control LLite](https://huggingface.co/kohya-ss/controlnet-lllite) for **SD-XL** models  
  LLLite models for SDXL at 46MB only provide lightweight image control  
- [TenecentAILab IP-Adapter](https://github.com/tencent-ailab/IP-Adapter) for **SD 1.5** and **SD-XL** models  
  IP-Adapters provides great style transfer functionality at much lower resource cost at below 100MB for SD15 and 700MB for SDXL  
  IP-Adapters can be combined with ControlNet for more stable results, especially when doing batch/video processing  
- [CiaraRowles TemporalNet](https://huggingface.co/CiaraRowles/TemporalNet) for **SD 1.5** models  
  ControlNet model designed to enhance temporal consistency and reduce flickering for batch/video processing  

All built-in models are downloaded upon first use and stored stored in:  
  `/models/controlnet`, `/models/adapter`, `/models/xs`, `/models/lite`, `/models/processor`

Listed below are all models that are supported out-of-the-box:

### ControlNet  

- **SD15**:  
  Canny, Depth, IP2P, LineArt, LineArt Anime, MLDS, NormalBae, OpenPose,  
  Scribble, Segment, Shuffle, SoftEdge, TemporalNet, HED, Tile  
- **SDXL**:  
  Canny Small XL, Canny Mid XL, Canny XL, Depth Zoe XL, Depth Mid XL

Note: only models compatible with currently loaded base model are listed  
Additional ControlNet models in safetensors can be downloaded manually and placed into corresponding folder: `/models/control/controlnet`  

## ControlNet XS

- **SDXL**:  
  Canny, Depth  

## ControlNet LLLite

- **SDXL**:  
  Canny, Canny anime, Depth anime, Blur anime, Pose anime, Replicate anime

Note: control-lllite is implemented using unofficial implementation and its considered experimental  
Additional ControlNet models in safetensors can be downloaded manually and placed into corresponding folder: `/models/control/lite`  

### T2I-Adapter

    'Segment': 'TencentARC/t2iadapter_seg_sd14v1',
    'Zoe Depth': 'TencentARC/t2iadapter_zoedepth_sd15v1',
    'OpenPose': 'TencentARC/t2iadapter_openpose_sd14v1',
    'KeyPose': 'TencentARC/t2iadapter_keypose_sd14v1',
    'Color': 'TencentARC/t2iadapter_color_sd14v1',
    'Depth v1': 'TencentARC/t2iadapter_depth_sd14v1',
    'Depth v2': 'TencentARC/t2iadapter_depth_sd15v2',
    'Canny v1': 'TencentARC/t2iadapter_canny_sd14v1',
    'Canny v2': 'TencentARC/t2iadapter_canny_sd15v2',
    'Sketch v1': 'TencentARC/t2iadapter_sketch_sd14v1',
    'Sketch v2': 'TencentARC/t2iadapter_sketch_sd15v2',

- **SD15**:  
  Segment, Zoe Depth, OpenPose, KeyPose, Color, Depth v1, Depth v2, Canny v1, Canny v2, Sketch v1, Sketch v2  
- **SDXL**:  
  Canny XL, Depth Zoe XL, Depth Midas XL, LineArt XL, OpenPose XL, Sketch XL  

*Note*: Only models compatible with currently loaded base model are listed

### Processors

- **Pose style**: OpenPose, DWPose, MediaPipe Face
- **Outline style**: Canny, Edge, LineArt Realistic, LineArt Anime, HED, PidiNet
- **Depth style**: Midas Depth Hybrid, Zoe Depth, Leres Depth, Normal Bae
- **Segmentation style**: SegmentAnything
- **Other**: MLSD, Shuffle

*Note*: Processor sizes can vary from none for built-in ones to anywhere between 200MB up to 4.2GB for ZoeDepth-Large

### Segmentation Models

There are 8 Auto-segmentation models available:  
  
- Facebook SAM ViT Base (357MB)  
- Facebook SAM ViT Large (1.16GB) 
- Facebook SAM ViT Huge (2.56GB) 
- SlimSAM Uniform (106MB)
- SlimSAM Uniform Tiny (37MB)
- Rembg Silueta
- Rembg U2Net  
- Rembg ISNet

### Reference

Reference mode is its own pipeline, so it cannot have multiple units or processors  

## Workflows

### Inputs & Outputs

- Image -> Image
- Batch: list of images -> Gallery and/or Video
- Folder: folder with images -> Gallery and/or Video
- Video -> Gallery and/or Video

*Notes*:
- Input/Output/Preview panels can be minimized by clicking on them  
- For video output, make sure to set video options  

### Unit

- Unit is: **input** plus **process** plus **control**
- Pipeline consists of any number of configured units  
  If unit is using using control modules, all control modules inside pipeline must be of same type  
  e.g. **ControlNet**, **ControlNet-XS**, **T2I-Adapter** or **Reference**
- Each unit can use primary input or its own override input  
- Each unit can have no processor in which case it will run control on input directly  
  Use when you're using predefined input templates  
- Unit can have no control in which case it will run processor only  
- Any combination of input, processor and control is possible  
  For example, two enabled units with process only will produce compound processed image but without control  

### What-if?

- If no input is provided then pipeline will run in **txt2img** mode  
  Can be freely used instead of standard `txt2img`  
- If none of units have control or adapter, pipeline will run in **img2img** mode using input image  
  Can be freely used instead of standard `img2img`  
- If you have processor enabled, but no controlnet or adapter loaded,  
  pipeline will run in **img2img** mode using processed input
- If you have multiple processors enabled, but no controlnet or adapter loaded,  
  pipeline will run in **img2img** mode on *blended* processed image  
- Output resolution is by default set to input resolution,  
  Use resize settings to force any resolution  
- Resize operation can run before (on input image) or after processing (on output image)  
- Using video input will run pipeline on each frame unless **skip frames** is set  
  Video output is standard list of images (gallery) and can be optionally encoded into a video file  
  Video file can be interpolated using **RIFE** for smoother playback  

### Overrides

- Control can be based on main input or each individual unit can have its own override input
- By default, control runs in default control+txt2img mode
- If init image is provided, it runs in control+img2img mode  
  Init image can be same as control image or separate
- IP adapter can be applied to any workflow
- IP adapter can use same input as control input or separate

### Inpaint

- Inpaint workflow is triggered when input image is provided in **inpaint** mode
- Inpaint mode can be used with image-to-image or controlnet workflows
- Other unit types such as T2I, XS or Lite do not support inpaint mode

### Outpaint

- Outpaint workflow is triggered when input image is provided in **outpaint** mode
- Outpaint mode can be used with image-to-image or controlnet workflows
- Other unit types such as T2I, XS or Lite do not support outpaint mode
- Recommendation is to increase denoising strength to at least 0.8 since outpained area is blank and needs to be filled with noise
- Outpaint folloing input image can be controled by overlap setting - higher overlap and more of original image will be part of the outpaint process

## Logging  

To enable extra logging for troubleshooting purposes,  
set environment variables before running **SD.Next**

- Linux:
  > export SD_CONTROL_DEBUG=true  
  > export SD_PROCESS_DEBUG=true  
  > ./webui.sh --debug  

- Windows:
  > set SD_CONTROL_DEBUG=true  
  > set SD_PROCESS_DEBUG=true  
  > webui.bat --debug  

*Note*: Starting with debug info enabled also enables **Test** mode in Control module

## Limitations / TODO

### Known issues

- Using model offload can cause Control models to be on the wrong device at the time of the execution  
  Example error message:
  > Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same  

  Workaround: Disable **model offload** in settings -> diffusers and use **move model** option instead  

- Issues after trying to use DWPose and installation fails: `` error.  
  Example error message:
  > Control processor DWPose: DLL load failed while importing _ext  

  Workaround: Activate venv and run following commands to install dwpose dependencies manually:  
  `pip install -U openmim --no-deps`  
  `mim install mmengine mmcv mmpose mmdet --no-deps`

## Future

- Pose editor
- Process multiple images in batch in parallel
- ControlLora <https://huggingface.co/stabilityai/control-lora>
- Multi-frame rendering <https://xanthius.itch.io/multi-frame-rendering-for-stablediffusion>
- Deflickering and deghosting