CivArchive
    Z-Image [fp8] - Base
    Preview 119039111
    Preview 119087108

    fp8 quantized Z-Image for ComfyUI using its quantization feature "TensorCoreFP8Layout".

    • Scaled fp8 weights. higher precision than pure fp8.

    • Use hardware fp8 on supported GPUs (only for turbo, see below).

    Also with "mixed precision". Important layers remain in bf16.

    There is no "official" fp8 version for z-image from ComfyUI, so I made my own.

    All credit belongs to the original model author. License is the same as the original model.

    Note: Those features are officially supported by ComfyUI. This file is just a weight file.

    • Use ComfyUI built-in loader nodes to load.

    • If you got error, report to ComfyUI repo. Not here.


    Base

    Quantized Z-Image. Aka. the "base" version of z-image.

    https://huggingface.co/Tongyi-MAI/Z-Image

    Note: No hardware fp8, all calculations are still using bf16. This is intentional. Hardware fp8/4 etc. do not work well with LoRA.


    Turbo

    Quantized Z-Image-Turbo

    https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

    It supports hardware fp8. If your GPU has hardware fp8, ComfyUI will automatically use hardware fp8, and you will find this model is faster than bf16 and other fp8 models.

    GPUs with hardware fp8:

    • Nvidia: RTX 4xxx and later (aka. SM ≥ 8.9 Ada)

    • AMD: gfx1200, gfx1201, gfx950 (according to comfyui code)

    Comparisons:

    on rtx 4090:

    • gguf q4_K model: -26% it/s (dequantization overhead)

    • old fp8 weight only model: -8% it/s (dequantization overhead)

    • bf16 model: baseline.

    • this model: +31% it/s

    • this model + torch.compile: +60% it/s

    rtx 5xxx not tested. Should be faster than 4xxx because newer tensor cores and better fp8 support.

    amd not tested.

    Welcome to share your results in the comment section.

    Why is this fp8 model faster than old fp8 model and gguf q8, etc?

    Old way: fp8 weight -> dequantizing to bf16 weight -> do linear in bf16 -> delete bf16 weight

    New way: fp8 weight -> do linear in fp8

    What if my GPU does not have hardware fp8?

    ComfyUI will fallback to the "old way". It will not be faster, but it is still better than old fp8 model because it uses scaled fp8 and mixed precision.


    Qwen3 4b

    Update: not recommended.

    Comfyui-gguf has supported qwen3. So, use gguf instead. Recommend:

    https://huggingface.co/unsloth/Qwen3-4B-GGUF/blob/main/Qwen3-4B-UD-Q8_K_XL.gguf

    Why gguf? gguf q8 has a little bit higher precision than comfyui built-in scaled fp8.

    ===

    Quantized Qwen3 4b.

    https://huggingface.co/Qwen/Qwen3-4B

    Scaled fp8 + mixed precision.

    Early (embed_tokens, layers.[0-1]) and final (layers.[34-35]) layers are still in BF16.

    Checkpoint
    ZImageBase

    Details

    Downloads
    1,034
    Platform
    CivitAI
    Platform Status
    Available
    Created
    1/28/2026
    Updated
    2/1/2026
    Deleted
    -

    Files

    zImageFp8_base.safetensors

    Mirrors

    Huggingface (1 mirrors)
    CivitAI (1 mirrors)