Spaces:
Runtime error
Runtime error
ZLUDA (CUDA Wrapper) for AMD GPUs in Windows | |
### Warning | |
ZLUDA does not fully support PyTorch in its official build. So ZLUDA support is so tricky and unstable. Support is limited at this time. | |
Please don't create issues regarding ZLUDA on github. Feel free to reach out via the ZLUDA thread in the help channel on discord. | |
## Installing ZLUDA for AMD GPUs in Windows. | |
#### Note | |
_This guide assumes you have [git and python](https://github.com/vladmandic/automatic/wiki/Installation#install-python-and-git) installed, have used SD.Next before, and are comfortable using the command prompt, navigating Windows Explorer, renaming files and folders, and working with zip files._ | |
#### Compatible GPUs | |
A list of compatible GPUs can be found [here](https://rocm.docs.amd.com/projects/install-on-windows/en/develop/reference/system-requirements.html). | |
If your GPU is not on the list, you may need to build your own roclabs,please follow the instructions in [Rocm Support guide](https://github.com/vladmandic/automatic/wiki/Rocm-Support). | |
_(Note: including intergrated GPUs)_ | |
Note: If you have an integrated GPU (iGPU), you may need to disable it, or use the `HIP_VISIBLE_DEVICES` environment variable. Learn more [here](https://github.com/vosen/ZLUDA?tab=readme-ov-file#hardware). | |
### Install Visual C++ Runtime | |
_Note: Most everyone would have this anyway, since it comes with a lot of games, but there's no harm in trying to install it._ | |
Grab the latest version of Visual C++ Runtime from https://aka.ms/vs/17/release/vc_redist.x64.exe (this is a direct download link) and then run it. | |
If you get the options to Repair or Uninstall, then you already have it installed and can click Close. Otherwise, install it. | |
### Install ZLUDA | |
ZLUDA is now auto-installed, and automatically added to PATH, when starting webui.bat with `--use-zluda`. | |
### Install HIP SDK | |
Install HIP SDK 5.7 from https://www.amd.com/en/developer/resources/rocm-hub/hip-sdk.html | |
So long as your regular AMD GPU driver is up to date, you don't need to install the PRO driver HIP SDK suggests. | |
### Add folders to PATH | |
Add `%HIP_PATH%bin` to your PATH. | |
https://github.com/brknsoul/ROCmLibs/wiki/Adding-folders-to-PATH | |
_Note: `%HIP_PATH%bin` typically relates to `"C:\Program Files\AMD\ROCm\5.7\bin"`, assuming Windows is installed on C:._ | |
### Replace HIP SDK library files for GPU architectures gfx1031 and gfx1032 | |
Go to https://rocm.docs.amd.com/projects/install-on-windows/en/develop/reference/system-requirements.html and find your GPU model. | |
If your GPU model has a โ in both columns then skip to [Compilation, Settings, and First Generation](https://github.com/vladmandic/automatic/wiki/ZLUDA#compilation-settings-and-first-generation). | |
If your GPU model has an โ in the HIP SDK column (LLVM targets gfx1031 and gfx1032) follow the instructions below; | |
1. Go to `%HIP_PATH%bin\rocblas` | |
2. Rename `library` to something else, like `origlibrary` | |
3. Download [ROCmLibs.zip](https://github.com/brknsoul/ROCmLibs/raw/main/ROCmLibs.zip?download=) | |
i. Alternative: If you have a 6600 or 6600xt (gfx1032) GPU, give [Optimised_ROCmLibs_gfx1032.7z](https://github.com/brknsoul/ROCmLibs/raw/main/Optimised_ROCmLibs_gfx1032.7z?download=) a go. It seems to be about 50% faster. Thanks FremontDango! | |
4. Open the zip file. | |
5. Drag and drop the `library` folder from ROCmLibs.zip into `%HIP_PATH%bin\rocblas` (The folder you opened in step 1). | |
6. Reboot PC | |
If your GPU model not in the HIP SDK column or not availalbe in the above list, follow the instructions in [Rocm Support guide](https://github.com/vladmandic/automatic/wiki/Rocm-Support) to build your own RocblasLibs; | |
### Install or Update SD.Next | |
Install SD.Next; | |
`git clone https://github.com/vladmandic/automatic` | |
then | |
`cd automatic` | |
then | |
`webui.bat --use-zluda --debug --autolaunch` | |
<br /><br /> | |
or Update SD.Next | |
(from a current SD.Next install folder) | |
`venv\scripts\activate` | |
`pip uninstall -y torch-directml torch` | |
`deactivate` | |
`git pull` | |
`webui.bat --use-zluda --debug --autolaunch --reinstall` | |
_(after running successfully once, you can remove `--reinstall`)_ | |
_Note: ZLUDA functions best in Diffusers Backend, where certain Diffusers-only options are available_ | |
### Compilation, Settings, and First Generation | |
After the UI starts, head on over to System Tab > Compute Settings | |
Set "Attention optimization method" to "Dynamic Attention BMM" | |
Now, try to generate something. | |
This should take a fair while (10-15mins, or even longer; some reports state over an hour) to compile, but this compilation should only need to be done once. | |
Note: There will be no progress bar, as this is done by ZLUDA and not SD.Next. Eventually your image will start generating. | |
## Comparison (DirectML) | |
| | DirectML | ZLUDA | | |
|-------------|----------|--------| | |
| Speed | Slower | Faster | | |
| VRAM usage | More | Less | | |
| VRAM GC | โ | โ | | |
| Traning | * | โ | | |
| Flash Attention | โ | โ | | |
| FFT | โ | โ | | |
| FFTW | โ | โ | | |
| DNN | โ | ๐ง | | |
| RTC | โ | โ | | |
| Source Code | Closed | Opened | | |
| Python | <=3.10 | Same as CUDA | | |
*: Known as possible, but uses too much VRAM to train stable diffusion models/LoRAs/etc. | |
## Compatibility | |
| DTYPE | | | |
|-------|------------| | |
| FP64 | โ | | |
| FP32 | โ | | |
| FP16 | โ | | |
| BF16 | โ | | |
| LONG | โ | | |
| INT8 | โ * | | |
| UINT8 | โ * | | |
| INT4 | โ | | |
| FP8 | โ | | |
| BF8 | โ | | |
*: Not tested. | |
*** | |
## Experimental Settings | |
#### Sections below are _optional_ and _highly experimental_, and aren't required to start generating images. Ensure you can generate images first _before_ trying these. | |
### Experimental Speed Increase Using deepcache (optional) | |
Start SD.Next, head on over to System Tab > Compute Settings. | |
Scroll down to "Model Compile" and tick the 'Model', 'VAE', and 'Text Encoder' boxes. | |
Select "deep-cache" as your Model compile backend. | |
Apply and Shutdown, and restart SD.Next. | |
### Enabling ZLUDA DNN (partial support) | |
This is PARTIAL and INCOMPLETE support of a performance library, ZLUDA DNN. Most of the cases, trying this will waste your time. | |
1. Checkout `dev` branch. | |
2. Download `ZLUDA.zip` from [v3.7-pre5-dnn](https://github.com/lshqqytiger/ZLUDA/releases/tag/v3.7-pre5-dnn), and unpack it upon your ZLUDA folder. (default: `path/to/sd.next/.zluda`) | |
3. Replace `path/to/sd.next/venv/Lib/site-packages/torch/lib/cudnn64_8.dll` with `path/to/sd.next/.zluda/cudnn.dll`. (rename `cudnn.dll` and overwrite `torch/lib/cudnn64_8.dll`) | |
4. Download `5.7.zip` from the same release, and unpack it upon your HIP SDK folder. (`%HIP_PATH%`) | |
5. Tick `Enable ZLUDA DNN`, and restart webui. | |
#### Simple Comparison | |
RX 7900 XTX, not a optimal settings. | |
- without DNN: 16.66 ~ 16.9it/s | |
- with DNN: 17.1 ~ 17.4it/s | |
--- | |
### If you get a "ZLUDA: failed to automatically patch torch" error; | |
- Manually download the ZLUDA release marked with a green "latest" tag from https://github.com/lshqqytiger/ZLUDA/releases/ | |
- Unzip it somewhere, like `C:\ZLUDA` | |
- Add `C:\ZLUDA` to your PATH following [this](https://github.com/brknsoul/ROCmLibs/wiki/Adding-folders-to-PATH) guide. | |
- *If there's a ZLUDA path there already, be sure to remove it, before closing the dialog box* | |
- Manually copy and rename the following files from `C:\ZLUDA` to `[SD.Next Install Folder]\venv\Lib\site-packages\torch\lib`, overwriting the originals; | |
cublas.dll -> cublas64_11.dll | |
cusparse.dll -> cusparse64_11.dll | |
nvrtc.dll -> nvrtc64_112_0.dll | |