Skip to content

Stable Cascade

Vladimir Mandic edited this page Mar 7, 2024 · 9 revisions

Stable-Casdade

Original repo: https://github.com/Stability-AI/StableCascade

Install: Experimental only

Note: Those steps will be removed for the release version

  • install SD.Next as usual and start it so all requirements are installed
  • Install diffusers dev:

    pip uninstall diffusers pip install git+https://github.com/huggingface/diffusers/

  • use SD.Next from dev branch

    git checkout dev

  • start with --experimental command line flag

    ./webui --experimental --debug

Use

  1. Set your compute precision in Settings -> Compute -> Precision
    to either BF16 (if supported) or FP32 (if not supported)
  2. Enable model offloading in Settings -> Diffusers -> Model CPU offload
    without this, stable cascade will use ~16GB of VRAM
  3. Select model from Networks -> Models -> Reference
    and it will automatically be downloaded on first use and loaded into SD.Next
    attempting to load a manually downloaded safetensors files is not supported as model requires special handling

Resources

With correct tuning, it is possible to run Stable Cascade on a 8GB VRAM GPU But any real performance is to be desired - 1024x1024 on RTX4090 using BF16 is barely reaching 4 it/s

Params

  • Prompt & Negative prompt: as usual
  • Width & Height: as usual
  • CFG scale: used to condition the prior model, reference value is ~4
  • Secondary CFG scale: used to condition decoder model, reference value is ~1
  • Steps: used to control number of steps of the prior model
  • Refiner steps: used to control number of steps of the decoder model
  • Sampler: set to Default before loading a model
    Stable Cascade has its own sampler and results with standard samplers will look suboptimal
    Built-in sampler is DDIM/DDPM based, so if you want to experiment at least use similar sampler

Notes

  • Default variation that will be downloaded and loaded is FULL model with BF16 precision
  • If model download fails, simply retry it, it will continue from where it left off
  • Model consists out of 3 stages split into 2 pipelines which are exected as C -> B -> A:
    • Prior pipeline: 8.9 GB total = 1.3GB TextEncoder + 0.5 ImageEncoder + 7GB Stage C UNet
    • Decoder pipeline: 4.4 GB total = 3.0GB Stage B Decoder + 1.4GB Stage A VQGan VAE

Variations

Overview

Note this is included as reference only as loading different variations is currently not supported

Stable cascade is a 3-stage model split into two pipelines (so-called prior and decoder) and comes into two main variations: Full and Lite
You can select which one to use from Networks -> Models -> Reference

Additionally, each variation comes in 3 different precisions: FP32, BF16, and FP16
Note: FP16 is an unofficial version by @KohakuBlueleaf of the model fixed to work with FP16 and may result in slightly different output

Which precision is going to get loaded depends on:

  • your user preference in Settings -> Compute -> Precision
  • and GPU compatibility as not all GPUs support all precision types

Sizes

Stage A and auxiliary models sizes are fixed and noted above
Stage B and Stage C models are dependent on the variation and precision used

Variation Precision Stage B Stage C
Full FP32 6.2GB 14GB
Full BF16 3.1GB 7GB
Full FP16 N/A 7GB
Lite FP32 2.8GB 4GB
Lite BF16 1.4GB 2GB
Lite FP16 N/A N/A
Clone this wiki locally