File size: 1,566 Bytes
c19ca42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
## Usage
0. Use Diffusers backend. `Execution & Models` -> `Execution backend`
1. Go into `Compute Settings`  
2. Enable `Compress Model weights with NNCF` options  
3. Restart the WebUI if it's your first time using NNCF. Otherwise, just reload the model.  

### Features
* Uses INT8, halves the model size  
Saves 3.4 GB of VRAM with SDXL  
* Works in Diffusers backend  
### Disadvantages
* It is Autocast, GPU will still use 16 Bit to run the model and will be slower  
* Uses INT8, can break ControlNet  
* Using Lora will trigger model reload  
* Not implemented in Original backend  
* Fused projections are not compatible with NNCF    


## Options
These results compares NNCF 8 bit to 16 bit.  

- Model:  
  Compresses UNet or Transformers part of the model.  
  This is where the most memory savings happens for Stable Diffusion.  

  SDXL: 2500 MB~ memory savings.  
  SD 1.5: 750 MB~ memory savings.  
  PixArt-XL-2: 600 MB~ memory savings.  

- Text Encoder:  
  Compresses Text Encoder parts of the model.  
  This is where the most memory savings happens for PixArt.  

  PixArt-XL-2: 4750 MB~ memory savings.  
  SDXL: 750 MB~ memory savings.  
  SD 1.5: 120 MB~ memory savings.  

- VAE:  
  Compresses VAE part of the model.  
  Memory savings from compressing VAE is pretty small.  

  SD 1.5 / SDXL / PixArt-XL-2: 75 MB~ memory savings.  

- 4 Bit Compression and Quantization:  
  4 bit compression modes and quantization can be used with OpenVINO backend.  
  For more info: https://github.com/vladmandic/automatic/wiki/OpenVINO#quantization