Stable Cascade

Stable-Casdade

Original repo: https://github.com/Stability-AI/StableCascade

Install: Experimental only

Note: Those steps will be removed for the release version

install SD.Next as usual and start it so all requirements are installed
Install diffusers dev:

pip uninstall diffusers pip install git+https://github.com/huggingface/diffusers/
use SD.Next from dev branch

git checkout dev
start with --experimental command line flag

./webui --experimental --debug

Use

Set your compute precision in Settings -> Compute -> Precision
to either BF16 (if supported) or FP32 (if not supported)
Enable model offloading in Settings -> Diffusers -> Model CPU offload
without this, stable cascade will use ~16GB of VRAM
Select model from Networks -> Models -> Reference
and it will automatically be downloaded on first use and loaded into SD.Next
attempting to load a manually downloaded safetensors files is not supported as model requires special handling

Resources

With correct tuning, it is possible to run Stable Cascade on a 8GB VRAM GPU But any real performance is to be desired - 1024x1024 on RTX4090 using BF16 is barely reaching 4 it/s

Params

Prompt & Negative prompt: as usual
Width & Height: as usual
CFG scale: used to condition the prior model, reference value is ~4
Secondary CFG scale: used to condition decoder model, reference value is ~1
Steps: used to control number of steps of the prior model
Refiner steps: used to control number of steps of the decoder model
Sampler: set to Default before loading a model
Stable Cascade has its own sampler and results with standard samplers will look suboptimal
Built-in sampler is DDIM/DDPM based, so if you want to experiment at least use similar sampler

Notes

Default variation that will be downloaded and loaded is FULL model with BF16 precision
If model download fails, simply retry it, it will continue from where it left off
Model consists out of 3 stages split into 2 pipelines which are exected as C -> B -> A:
- Prior pipeline: 8.9 GB total = 1.3GB TextEncoder + 0.5 ImageEncoder + 7GB Stage C UNet
- Decoder pipeline: 4.4 GB total = 3.0GB Stage B Decoder + 1.4GB Stage A VQGan VAE

Variations

Overview

Note this is included as reference only as loading different variations is currently not supported

Stable cascade is a 3-stage model split into two pipelines (so-called prior and decoder) and comes into two main variations: Full and Lite
You can select which one to use from Networks -> Models -> Reference

Additionally, each variation comes in 3 different precisions: FP32, BF16, and FP16
Note: FP16 is an unofficial version by @KohakuBlueleaf of the model fixed to work with FP16 and may result in slightly different output

Which precision is going to get loaded depends on:

your user preference in Settings -> Compute -> Precision
and GPU compatibility as not all GPUs support all precision types

Sizes

Stage A and auxiliary models sizes are fixed and noted above
Stage B and Stage C models are dependent on the variation and precision used

Variation	Precision	Stage B	Stage C
Full	FP32	6.2GB	14GB
Full	BF16	3.1GB	7GB
Full	FP16	N/A	7GB
Lite	FP32	2.8GB	4GB
Lite	BF16	1.4GB	2GB
Lite	FP16	N/A	N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly