fp8 quantized Z-Image for ComfyUI using its quantization feature "TensorCoreFP8Layout".
Scaled fp8 weights. higher precision than pure fp8.
Use hardware fp8 on supported GPUs (only for turbo, see below).
Also with "mixed precision". Important layers remain in bf16.
There is no "official" fp8 version for z-image from ComfyUI, so I made my own.
All credit belongs to the original model author. License is the same as the original model.
Note: Those features are officially supported by ComfyUI. This file is just a weight file.
Use ComfyUI built-in loader nodes to load.
If you got error, report to ComfyUI repo. Not here.
Base
Quantized Z-Image. Aka. the "base" version of z-image.
https://huggingface.co/Tongyi-MAI/Z-Image
Note: No hardware fp8, all calculations are still using bf16. This is intentional. Hardware fp8/4 etc. do not work well with LoRA.
Turbo
Quantized Z-Image-Turbo
https://huggingface.co/Tongyi-MAI/Z-Image-Turbo
It supports hardware fp8. If your GPU has hardware fp8, ComfyUI will automatically use hardware fp8, and you will find this model is faster than bf16 and other fp8 models.
GPUs with hardware fp8:
Nvidia: RTX 4xxx and later (aka. SM ≥ 8.9 Ada)
AMD: gfx1200, gfx1201, gfx950 (according to comfyui code)
Comparisons:
on rtx 4090:
gguf q4_K model: -26% it/s (dequantization overhead)
old fp8 weight only model: -8% it/s (dequantization overhead)
bf16 model: baseline.
this model: +31% it/s
this model + torch.compile: +60% it/s
rtx 5xxx not tested. Should be faster than 4xxx because newer tensor cores and better fp8 support.
amd not tested.
Welcome to share your results in the comment section.
Why is this fp8 model faster than old fp8 model and gguf q8, etc?
Old way: fp8 weight -> dequantizing to bf16 weight -> do linear in bf16 -> delete bf16 weight
New way: fp8 weight -> do linear in fp8
What if my GPU does not have hardware fp8?
ComfyUI will fallback to the "old way". It will not be faster, but it is still better than old fp8 model because it uses scaled fp8 and mixed precision.
Qwen3 4b
Update: not recommended.
Comfyui-gguf has supported qwen3. So, use gguf instead. Recommend:
https://huggingface.co/unsloth/Qwen3-4B-GGUF/blob/main/Qwen3-4B-UD-Q8_K_XL.gguf
Why gguf? gguf q8 has a little bit higher precision than comfyui built-in scaled fp8.
===
Quantized Qwen3 4b.
https://huggingface.co/Qwen/Qwen3-4B
Scaled fp8 + mixed precision.
Early (embed_tokens, layers.[0-1]) and final (layers.[34-35]) layers are still in BF16.
Description
Same as DiT model, fp8 scaled + mixed precision.
Early (embed_tokens, layers.[0-1]) and final (layers.[34-35]) layers are still in BF16.