Spaces:

bilegentile
/

test

Runtime error

test / wiki /Model-Compression-with-NNCF.md

Upload folder using huggingface_hub

c19ca42 verified 12 months ago

1.57 kB

	## Usage
	0. Use Diffusers backend. `Execution & Models` -> `Execution backend`
	1. Go into `Compute Settings`
	2. Enable `Compress Model weights with NNCF` options
	3. Restart the WebUI if it's your first time using NNCF. Otherwise, just reload the model.

	### Features
	* Uses INT8, halves the model size
	Saves 3.4 GB of VRAM with SDXL
	* Works in Diffusers backend
	### Disadvantages
	* It is Autocast, GPU will still use 16 Bit to run the model and will be slower
	* Uses INT8, can break ControlNet
	* Using Lora will trigger model reload
	* Not implemented in Original backend
	* Fused projections are not compatible with NNCF


	## Options
	These results compares NNCF 8 bit to 16 bit.

	- Model:
	Compresses UNet or Transformers part of the model.
	This is where the most memory savings happens for Stable Diffusion.

	SDXL: 2500 MB~ memory savings.
	SD 1.5: 750 MB~ memory savings.
	PixArt-XL-2: 600 MB~ memory savings.

	- Text Encoder:
	Compresses Text Encoder parts of the model.
	This is where the most memory savings happens for PixArt.

	PixArt-XL-2: 4750 MB~ memory savings.
	SDXL: 750 MB~ memory savings.
	SD 1.5: 120 MB~ memory savings.

	- VAE:
	Compresses VAE part of the model.
	Memory savings from compressing VAE is pretty small.

	SD 1.5 / SDXL / PixArt-XL-2: 75 MB~ memory savings.

	- 4 Bit Compression and Quantization:
	4 bit compression modes and quantization can be used with OpenVINO backend.
	For more info: https://github.com/vladmandic/automatic/wiki/OpenVINO#quantization