Z-Image [fp8] - CivArchive (CivitAI Archive)

Z-Image [fp8] - Turbo

fp8 quantized Z-Image for ComfyUI using its quantization feature "TensorCoreFP8Layout".

Scaled fp8 weights. higher precision than pure fp8.
Also with "mixed precision". Important layers remain in bf16.

There is no "official" fp8 version for z-image from ComfyUI, so I made my own.

All credit belongs to the original model author. License is the same as the original model.

FYI: many people might think that fp8 model has huge quality loss. That's because "fp8 model" saved by ComfyUI is ... just a model with fp8 weights. And many creators made their fp8 model in that way.

Normally when people talk about "fp8 model", they mean "quantized fp8 model", like scaled fp8 and gguf q8. The weights are "compressed".

If you see creators complaining about the poor quality of fp8 models saved by ComfyUI, send them this link, or make your own quantized fp8 model from bf16.

https://github.com/silveroxides/ComfyUI-QuantOps

I just share the tool, I'm not using it. I'm using my own old script.

Base

Quantized Z-Image. Aka. the "base" version of z-image.

https://huggingface.co/Tongyi-MAI/Z-Image

Note: No hardware fp8, all calculations are still using bf16. This is intentional.

Rev 1.1: An updated version with better "mixed precision". More bf16 layers, so the file is bigger. Previous version will be deleted.

Turbo

Quantized Z-Image-Turbo

https://huggingface.co/Tongyi-MAI/Z-Image-Turbo

Rev1.1: An updated version with better "mixed precision". More bf16 layers, so the file is bigger. No hardware fp8. Previous version will be deleted.

v1: It contains calibrated metadata for hardware fp8 linear. If you GPU supports it, ComfyUI will use hardware fp8 automatically, which should be a little bit faster. More about hardware fp8 and hardware requirement, see ComfyUI TensorCoreFP8Layout.

Qwen3 4b

Quantized Qwen3 4b. Scaled fp8 + mixed precision. Early (embed_tokens, layers.[0-1]) and final (layers.[34-35]) layers are still in BF16.

https://huggingface.co/Qwen/Qwen3-4B

Description

FAQ

Comments (31)

MeettyaDec 7, 2025

CivitAI

Thank you for sharing quantized qwen3 4b, work flowless with base CLIP loader on CPU. Old 5800x still holding strong :)

reakaakasky

Author

Dec 10, 2025

CivitAI

(12/9/2025): files reuploaded for ComfyUI v0.4.

ComfyUI v0.4 changed how it handles calibrated metadata. Turbo and Qwen3 files are reuploaded with updated metadata. Please redownload them to prevent quality regression.

MeMakeStuffDec 28, 2025

Could you post a workflow, please? :)

scruffynerfDec 10, 2025

CivitAI

Hey, can you share the script you're using to make this? I'd really like to try it on the Josiefied Qwen3 I'm using with JoZiMagic, and compare results. [more details if you want, just ask]

reakaakasky

Author

Dec 11, 2025

no, it's not a simple script, it's a bunch of messy scripts because I also need to sample the model to get calibrated metadata. Not easy to share, at least for now.

If you just want to quantize text encoder, I recommend gguf. Much more "efficient and precise" than comfyui built-in fp8, and can be smaller.

tbh simply replacing original text encoder with other version is counterproductive. Because DiT is not trained.

scruffynerfDec 11, 2025

@reakaakasky JoziMagic proves that your TBH is wrong, it works amazingly so even. JoziMagic makes different images from Stock Zimage, and in my extensive testing (including blind surveys of others on Discord), it's better than Stock sometimes [often tied, occasionally losing (10% or so of the time).] Try it, grab a Josiefied Qwen3 4b model GGUF q8, and you'll see for yourself.

analspasm1488209Dec 11, 2025· 1 reaction

@reakaakasky Having messy scripts is better than not having them. It would be great if you posted them

RytlockRedstoneDec 10, 2025

CivitAI

With Z-IMG Anime Fine-tune and NEWBIE IMG (referencing Lumina) both likely releasing soon, it's hard to imagine that just six months ago I was still getting headaches over SDXL's rigid prompt requirements. Now, it feels like everything has been solved

ikekph5Dec 11, 2025· 2 reactions

https://civitai.com/models/2197517

reakaakasky

Author

Dec 11, 2025· 1 reaction

I don't think there will be a z-image anime fine-tune. It's too problematic for big company.

They highly likely will use the anime data to train the editing version. for style consistency.

reakaakasky

Author

Dec 12, 2025· 2 reactions

It's a good thing. There will be no legal issues.

And, teaching the model to keep art style consistency from referencing images theoretically is way easier than memorizing thousands of art styles.

If the editing version has perfect art style consistency, then say goodbye to style loras and finetunes. The model will be able to generate any art style, no longer limited to the training dataset. All we need is giving the model style refencing images.

I am very optimistic that this will happen.

RyoandrDec 11, 2025· 3 reactions

CivitAI

I know this isn't exactly the topic, but is there any page / discussion / ressources for torch compile, I can't get it to work, with either comfy native node or KJ advanced.

NightKillers101Dec 12, 2025

CivitAI

Honestly really glad I found this checkpoint, loads quite fast and makes great realism pics, on par with the OG bf16 version, thank you for making this checkpoint variant. I'm so used to A1111 that I gave up on comfy, but now with ZIT, I'm learning the workflows as I go, it seems really nice with this fp8 variant.

dawn66666666Dec 12, 2025

CivitAI

Is it possible to release an INT8 quantized version? This is necessary for my Tesla V100 and 2080 Ti setups. Thank you very much.

kleindDec 12, 2025· 3 reactions

CivitAI

Can you please add a ComfyUI workflow? I tried downloading all the submitted generations into ComfyUI, none of them seems to have a workflow included.

MeMakeStuffJan 1, 2026· 1 reaction

https://civitai.com/models/2170134/z-image-turbo-workflow - this one works, just be sure to set the textencoder to the tensorcore one.

alucardnoir941Dec 15, 2025

CivitAI

Stupid question: ComfyUI does an auto "manual cast: torch.bfloat16" when it detects fp8 models. To circumvent that one need only at a --fast as runtime argument. My stupid question is if this is actually faster than other fp8 models or if this is just formatted in such a way Comfy no longer does it's auto "manual" cast and is as fast as any other pf8 model that is run of hardware that can run fp8 precision naively like the 40 and 50 series cards from nvidia?

And before you ask why I don't test it myself, space is at a premium at the moment. my SSD is a bit too full at present and I need to purge a few model before I dl a new model or checkpoint.

reakaakasky

Author

Dec 20, 2025

https://github.com/comfyanonymous/ComfyUI/blob/master/QUANTIZATION.md#calibration-for-activation-quantization

alucardnoir941Dec 20, 2025

@reakaakasky ah, so this would be activation quantized as opposed to a simple weight quant. thank you.

scruffynerfDec 22, 2025

CivitAI

Share a workflow? None of the stuff I've tried seems to be faster. 50xx card here

MeMakeStuffJan 1, 2026· 1 reaction

https://civitai.com/models/2170134/z-image-turbo-workflow - this one works, just be sure to set the text encoder to the tensorcore one.

blobby99Jan 28, 2026

Last time I tried fp8 on Blackwell, ComfyUI wasn't doing the maths right, and the output was far worse than GGUF Q8. This was some months ago. For fp8 to have a benefit, maths calculations need to be done in fp8, not fp16- but then the code has to be careful to handle the accumulated precision correctly. I don't think the ComfyUI coders are competent enough to handle issues of numerical analysis. Personally, as a Blackwell user, I see no current reason to move from GGUF - GGUF is well understood and well supported. These other compressed formats are not!

vkykukyyjmkDec 22, 2025

CivitAI

I can't get any speed-up. What am I doing wrong? I'm on a 4060 Ti, using the default ZIT workflow on the latest ComfyUI nightly. I tried --fast, --fp8_e4m3fn-unet, and every weight_dtype setting, but the console always says manual cast: torch.bfloat16. No matter what, the speed stays the same as with other fp8 or bf16 models.

MeMakeStuffJan 1, 2026

Have you tried --fast fp16_accumulation setting?

vkykukyyjmkJan 6, 2026

@MeMakeStuff Tried that, ~16s for 1024, 8 steps, res_multistep, exactly the same speed as with bf16 with no flags or anything. Still getting model weight dtype torch.float8_e4m3fn, manual cast: torch.float16 when selecting fp8_fast dtype OR model weight dtype torch.float16, manual cast: torch.float16 on default.

CUDA: 12.8

Torch: 2.7.1+cu128

Triton: 3.3.1

SageAttention: 2.2.0+cu128torch2.7.1.post3

Enabled fp16 accumulation

What am I missing?

reakaakasky

Author

Jan 6, 2026

fp8 model works without any flags, comfyui will automatically use it

"manual cast: torch.bfloat16." iirc this log does not matter, as long as you see "detecting/using mixprecision layers", something like this

"fp8 exactly the same speed as with bf16", normally fp8 model is ~10% slower, if you are sure the speed is the same, then fp8 works, just not fast enough, you might need torch.compile.

vkykukyyjmkJan 6, 2026

@reakaakasky I'm seeing Detected/Using and Found quantization metadata version 1 messages, but the speed improvement is only ~30% compared to other FP8 model and matches BF16 performance. Is this expected? I thought this should give 30%-80% speedup.

gonzalo_983Jan 14, 2026

CivitAI

“It will not be faster, but it is still better than the old FP8 model” - I’m somewhat new to all of this, so I could have made some mistakes, but after testing it on my 3070 Ti (8GB) + 16GB, it actually was faster, and not by small margins. https://i.imgur.com/NzJiTyF.png - Nunchaku r256 (worst visual result of all), Q8, Q6, FP8 (yours), and random FP8 from HF. So big thanks! I might actually switch from Q8 to this; I need to do more visual comparisons.

Anyway, about TorchCompileModelAdvanced - it should be after LoRA but before ModelSamplingAuraFlow, correct? And should I use the default settings (inductor / false / default / auto / true / 64 / false)? If so, then I think it unfortunatelly doesn't work in my case (meaning the speeds are all the same as before). RIP!

Thanks once again!

reakaakasky

Author

Jan 14, 2026· 1 reaction

I mean the sampling speed will not be faster. It might load faster, but that's depends on your cpu etc.

I'm not sure about rtx 30xx. So I don't know which model is faster.

If you prefer best quality. gguf q8 is the best.

gguf q8 (more complicated) > scaled fp8 >> pure fp8 >> Nunchaku (which is hardware q4, should be 3x faster, it supports 30xx iirc)

gonzalo_983Jan 14, 2026

@reakaakasky there is the Imgur link in my main comment with the speed comparisions; Nunchaku r256 loses against all of them while having the worst visual quality too. Either something is broken, or it isn't beneficial for my setup.

About Torch; so the 1st gen should be faster, well it isn't at all in my case, but that could be my PC. Thanks anyway!

xiongvang12851285Jan 15, 2026· 3 reactions

CivitAI

In my FACE editing test it lost about 25% quality. When I went back to BF16, the faces got very very good quality again so I can verify that this does not work well with FACE DETAILER workflow.

Checkpoint

ZImageTurbo

by reakaakasky

Download (Beta) View on CivitAI

base model

Details

Downloads

3,353

Platform

CivitAI

Platform Status

Available

Created

12/7/2025

Updated

5/4/2026

Deleted

Files

zImageTensorcorefp8_turbo.safetensors

Size:

6.41 GB

SHA256:

576608297bb73a237db0ad3a7c0d78c288c57f153389f44031440cc300daf665

Mirrors

Huggingface (1 mirrors)

zImageTurbo_v1.safetensors

CivitAI (1 mirrors)

zImageTensorcorefp8_turbo.safetensors

zImageFp8_turbo.safetensors

Size:

7.08 GB

SHA256:

b0023979f0450fb28d2270d25648dd33124affea3fdf4cdaa26189b13a27ce37

Mirrors

Huggingface (1 mirrors)

zImageTurboMixedFP8_v1.safetensors

CivitAI (1 mirrors)

zImageFp8_turbo.safetensors

Size:

6.41 GB

SHA256:

b1b285861cd425b0c1fd48c2dcaa880f27959cb368a58c52052c195a233f3fa3

Mirrors

CivitAI (1 mirrors)

zImageFp8_turbo.safetensors

Base

Turbo

Qwen3 4b

Description

FAQ

What is Z-Image [fp8]?

How do I use Z-Image [fp8]?

What should I watch out for with Z-Image models?

What other Z-Image-based models are worth knowing?

Can I use this model commercially?

What files are available and where can I download them?

Comments (31)

Details

Files

zImageTensorcorefp8_turbo.safetensors

Mirrors

zImageFp8_turbo.safetensors

Mirrors

zImageFp8_turbo.safetensors

Mirrors