Spaces:
Runtime error
Runtime error
## Usage | |
0. Use Diffusers backend. `Execution & Models` -> `Execution backend` | |
1. Go into `Compute Settings` | |
2. Enable `Compress Model weights with NNCF` options | |
3. Restart the WebUI if it's your first time using NNCF. Otherwise, just reload the model. | |
### Features | |
* Uses INT8, halves the model size | |
Saves 3.4 GB of VRAM with SDXL | |
* Works in Diffusers backend | |
### Disadvantages | |
* It is Autocast, GPU will still use 16 Bit to run the model and will be slower | |
* Uses INT8, can break ControlNet | |
* Using Lora will trigger model reload | |
* Not implemented in Original backend | |
* Fused projections are not compatible with NNCF | |
## Options | |
These results compares NNCF 8 bit to 16 bit. | |
- Model: | |
Compresses UNet or Transformers part of the model. | |
This is where the most memory savings happens for Stable Diffusion. | |
SDXL: 2500 MB~ memory savings. | |
SD 1.5: 750 MB~ memory savings. | |
PixArt-XL-2: 600 MB~ memory savings. | |
- Text Encoder: | |
Compresses Text Encoder parts of the model. | |
This is where the most memory savings happens for PixArt. | |
PixArt-XL-2: 4750 MB~ memory savings. | |
SDXL: 750 MB~ memory savings. | |
SD 1.5: 120 MB~ memory savings. | |
- VAE: | |
Compresses VAE part of the model. | |
Memory savings from compressing VAE is pretty small. | |
SD 1.5 / SDXL / PixArt-XL-2: 75 MB~ memory savings. | |
- 4 Bit Compression and Quantization: | |
4 bit compression modes and quantization can be used with OpenVINO backend. | |
For more info: https://github.com/vladmandic/automatic/wiki/OpenVINO#quantization |