Spaces:
Runtime error
Runtime error
File size: 9,237 Bytes
c19ca42 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
# Control
Native control module for SD.Next for Diffusers backend
Can be used for Control generation as well as Image and Text workflows
For a guide on the options and settings, as well as explanations for the controls themselves, see the [Control Guide](https://github.com/vladmandic/automatic/wiki/Control-Guide) page.
## Supported Control Models
- [lllyasviel ControlNet](https://github.com/lllyasviel/ControlNet) for **SD 1.5** and **SD-XL** models
Includes ControlNets as well as Reference-only mode and any compatible 3rd party models
Original ControlNets for SD15 are 1.4GB each and for SDXL its at massive 4.9GB
- [VisLearn ControlNet XS](https://vislearn.github.io/ControlNet-XS/) for **SD-XL** models
Lightweight ControlNet models for SDXL at 165MB only with near-identical results
- [TencentARC T2I-Adapter](https://github.com/TencentARC/T2I-Adapter) for **SD 1.5** and **SD-XL** models
T2I-Adapters provide similar functionality at much lower resource cost at only 300MB each
- [Kohya Control LLite](https://huggingface.co/kohya-ss/controlnet-lllite) for **SD-XL** models
LLLite models for SDXL at 46MB only provide lightweight image control
- [TenecentAILab IP-Adapter](https://github.com/tencent-ailab/IP-Adapter) for **SD 1.5** and **SD-XL** models
IP-Adapters provides great style transfer functionality at much lower resource cost at below 100MB for SD15 and 700MB for SDXL
IP-Adapters can be combined with ControlNet for more stable results, especially when doing batch/video processing
- [CiaraRowles TemporalNet](https://huggingface.co/CiaraRowles/TemporalNet) for **SD 1.5** models
ControlNet model designed to enhance temporal consistency and reduce flickering for batch/video processing
All built-in models are downloaded upon first use and stored stored in:
`/models/controlnet`, `/models/adapter`, `/models/xs`, `/models/lite`, `/models/processor`
Listed below are all models that are supported out-of-the-box:
### ControlNet
- **SD15**:
Canny, Depth, IP2P, LineArt, LineArt Anime, MLDS, NormalBae, OpenPose,
Scribble, Segment, Shuffle, SoftEdge, TemporalNet, HED, Tile
- **SDXL**:
Canny Small XL, Canny Mid XL, Canny XL, Depth Zoe XL, Depth Mid XL
Note: only models compatible with currently loaded base model are listed
Additional ControlNet models in safetensors can be downloaded manually and placed into corresponding folder: `/models/control/controlnet`
## ControlNet XS
- **SDXL**:
Canny, Depth
## ControlNet LLLite
- **SDXL**:
Canny, Canny anime, Depth anime, Blur anime, Pose anime, Replicate anime
Note: control-lllite is implemented using unofficial implementation and its considered experimental
Additional ControlNet models in safetensors can be downloaded manually and placed into corresponding folder: `/models/control/lite`
### T2I-Adapter
'Segment': 'TencentARC/t2iadapter_seg_sd14v1',
'Zoe Depth': 'TencentARC/t2iadapter_zoedepth_sd15v1',
'OpenPose': 'TencentARC/t2iadapter_openpose_sd14v1',
'KeyPose': 'TencentARC/t2iadapter_keypose_sd14v1',
'Color': 'TencentARC/t2iadapter_color_sd14v1',
'Depth v1': 'TencentARC/t2iadapter_depth_sd14v1',
'Depth v2': 'TencentARC/t2iadapter_depth_sd15v2',
'Canny v1': 'TencentARC/t2iadapter_canny_sd14v1',
'Canny v2': 'TencentARC/t2iadapter_canny_sd15v2',
'Sketch v1': 'TencentARC/t2iadapter_sketch_sd14v1',
'Sketch v2': 'TencentARC/t2iadapter_sketch_sd15v2',
- **SD15**:
Segment, Zoe Depth, OpenPose, KeyPose, Color, Depth v1, Depth v2, Canny v1, Canny v2, Sketch v1, Sketch v2
- **SDXL**:
Canny XL, Depth Zoe XL, Depth Midas XL, LineArt XL, OpenPose XL, Sketch XL
*Note*: Only models compatible with currently loaded base model are listed
### Processors
- **Pose style**: OpenPose, DWPose, MediaPipe Face
- **Outline style**: Canny, Edge, LineArt Realistic, LineArt Anime, HED, PidiNet
- **Depth style**: Midas Depth Hybrid, Zoe Depth, Leres Depth, Normal Bae
- **Segmentation style**: SegmentAnything
- **Other**: MLSD, Shuffle
*Note*: Processor sizes can vary from none for built-in ones to anywhere between 200MB up to 4.2GB for ZoeDepth-Large
### Segmentation Models
There are 8 Auto-segmentation models available:
- Facebook SAM ViT Base (357MB)
- Facebook SAM ViT Large (1.16GB)
- Facebook SAM ViT Huge (2.56GB)
- SlimSAM Uniform (106MB)
- SlimSAM Uniform Tiny (37MB)
- Rembg Silueta
- Rembg U2Net
- Rembg ISNet
### Reference
Reference mode is its own pipeline, so it cannot have multiple units or processors
## Workflows
### Inputs & Outputs
- Image -> Image
- Batch: list of images -> Gallery and/or Video
- Folder: folder with images -> Gallery and/or Video
- Video -> Gallery and/or Video
*Notes*:
- Input/Output/Preview panels can be minimized by clicking on them
- For video output, make sure to set video options
### Unit
- Unit is: **input** plus **process** plus **control**
- Pipeline consists of any number of configured units
If unit is using using control modules, all control modules inside pipeline must be of same type
e.g. **ControlNet**, **ControlNet-XS**, **T2I-Adapter** or **Reference**
- Each unit can use primary input or its own override input
- Each unit can have no processor in which case it will run control on input directly
Use when you're using predefined input templates
- Unit can have no control in which case it will run processor only
- Any combination of input, processor and control is possible
For example, two enabled units with process only will produce compound processed image but without control
### What-if?
- If no input is provided then pipeline will run in **txt2img** mode
Can be freely used instead of standard `txt2img`
- If none of units have control or adapter, pipeline will run in **img2img** mode using input image
Can be freely used instead of standard `img2img`
- If you have processor enabled, but no controlnet or adapter loaded,
pipeline will run in **img2img** mode using processed input
- If you have multiple processors enabled, but no controlnet or adapter loaded,
pipeline will run in **img2img** mode on *blended* processed image
- Output resolution is by default set to input resolution,
Use resize settings to force any resolution
- Resize operation can run before (on input image) or after processing (on output image)
- Using video input will run pipeline on each frame unless **skip frames** is set
Video output is standard list of images (gallery) and can be optionally encoded into a video file
Video file can be interpolated using **RIFE** for smoother playback
### Overrides
- Control can be based on main input or each individual unit can have its own override input
- By default, control runs in default control+txt2img mode
- If init image is provided, it runs in control+img2img mode
Init image can be same as control image or separate
- IP adapter can be applied to any workflow
- IP adapter can use same input as control input or separate
### Inpaint
- Inpaint workflow is triggered when input image is provided in **inpaint** mode
- Inpaint mode can be used with image-to-image or controlnet workflows
- Other unit types such as T2I, XS or Lite do not support inpaint mode
### Outpaint
- Outpaint workflow is triggered when input image is provided in **outpaint** mode
- Outpaint mode can be used with image-to-image or controlnet workflows
- Other unit types such as T2I, XS or Lite do not support outpaint mode
- Recommendation is to increase denoising strength to at least 0.8 since outpained area is blank and needs to be filled with noise
- Outpaint folloing input image can be controled by overlap setting - higher overlap and more of original image will be part of the outpaint process
## Logging
To enable extra logging for troubleshooting purposes,
set environment variables before running **SD.Next**
- Linux:
> export SD_CONTROL_DEBUG=true
> export SD_PROCESS_DEBUG=true
> ./webui.sh --debug
- Windows:
> set SD_CONTROL_DEBUG=true
> set SD_PROCESS_DEBUG=true
> webui.bat --debug
*Note*: Starting with debug info enabled also enables **Test** mode in Control module
## Limitations / TODO
### Known issues
- Using model offload can cause Control models to be on the wrong device at the time of the execution
Example error message:
> Input type (torch.FloatTensor) and weight type (torch.cuda.FloatTensor) should be the same
Workaround: Disable **model offload** in settings -> diffusers and use **move model** option instead
- Issues after trying to use DWPose and installation fails: `` error.
Example error message:
> Control processor DWPose: DLL load failed while importing _ext
Workaround: Activate venv and run following commands to install dwpose dependencies manually:
`pip install -U openmim --no-deps`
`mim install mmengine mmcv mmpose mmdet --no-deps`
## Future
- Pose editor
- Process multiple images in batch in parallel
- ControlLora <https://huggingface.co/stabilityai/control-lora>
- Multi-frame rendering <https://xanthius.itch.io/multi-frame-rendering-for-stablediffusion>
- Deflickering and deghosting
|