Spaces:
Runtime error
Runtime error
File size: 1,566 Bytes
c19ca42 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
## Usage
0. Use Diffusers backend. `Execution & Models` -> `Execution backend`
1. Go into `Compute Settings`
2. Enable `Compress Model weights with NNCF` options
3. Restart the WebUI if it's your first time using NNCF. Otherwise, just reload the model.
### Features
* Uses INT8, halves the model size
Saves 3.4 GB of VRAM with SDXL
* Works in Diffusers backend
### Disadvantages
* It is Autocast, GPU will still use 16 Bit to run the model and will be slower
* Uses INT8, can break ControlNet
* Using Lora will trigger model reload
* Not implemented in Original backend
* Fused projections are not compatible with NNCF
## Options
These results compares NNCF 8 bit to 16 bit.
- Model:
Compresses UNet or Transformers part of the model.
This is where the most memory savings happens for Stable Diffusion.
SDXL: 2500 MB~ memory savings.
SD 1.5: 750 MB~ memory savings.
PixArt-XL-2: 600 MB~ memory savings.
- Text Encoder:
Compresses Text Encoder parts of the model.
This is where the most memory savings happens for PixArt.
PixArt-XL-2: 4750 MB~ memory savings.
SDXL: 750 MB~ memory savings.
SD 1.5: 120 MB~ memory savings.
- VAE:
Compresses VAE part of the model.
Memory savings from compressing VAE is pretty small.
SD 1.5 / SDXL / PixArt-XL-2: 75 MB~ memory savings.
- 4 Bit Compression and Quantization:
4 bit compression modes and quantization can be used with OpenVINO backend.
For more info: https://github.com/vladmandic/automatic/wiki/OpenVINO#quantization |