Z-Image is a powerful and highly efficient image generation model with 6B parameters. It is currently has three variants:
🚀 Z-Image-Turbo – A distilled version of Z-Image that matches or exceeds leading competitors with only 8 NFEs (Number of Function Evaluations). It offers ⚡️sub-second inference latency⚡️ on enterprise-grade H800 GPUs and fits comfortably within 16G VRAM consumer devices. It excels in photorealistic image generation, bilingual text rendering (English & Chinese), and robust instruction adherence.
🧱 Z-Image-Base (this model) – The non-distilled foundation model. By releasing this checkpoint, we aim to unlock the full potential for community-driven fine-tuning and custom development.
✍️ Z-Image-Edit – A variant fine-tuned on Z-Image specifically for image editing tasks. It supports creative image-to-image generation with impressive instruction-following capabilities, allowing for precise edits based on natural language prompts.
Original ComfyUI Models: https://huggingface.co/Comfy-Org/z_image
Original HF Repo: https://huggingface.co/Tongyi-MAI/Z-Image
Description
Comfy Repack https://huggingface.co/Comfy-Org/z_image
FAQ
Comments (221)
I am guessing this one will not work in Forge Neo?
Yes I can confirm it works with forge neo. I just downloaded the checkpoint file and placed it into my base model folder. It works for me using the same text encoder and vae I used for ZIT. Also I used fp8 quantization to fit it on my gpu.
@RisingV Ill try again, it gave me an error but we will see
Works fine. But I updated torch and sage first.
@makoshark1975 ok, I hope you can get it working
It loaded and ran after I did a clean install of forge neo, but only get black images now
@Volnovik Just giving me solid black images
@makoshark1975 check sampler and schedule, some give blacks. Just drag and drop my last images to check config
@Volnovik I fixed it, I uninstalled sage and it works fine now.
@makoshark1975 did you update torch? https://civitai.com/articles/25427/updating-torch-on-forge-neo-in-stability-matrix
@Volnovik Yes, but still didn't work until I got rid of sage, all good now
Generation will be available shortly, but what about training?
Edit: Also the license (under user and social links) ends on a ":" without anything being listed.
Training also. And the license will be updated at some point, yes.
License is apache-2.0 according to tongyi huggingface repo.
Also ai-toolkit already implemented training with z-image-base.
Thank you! Wan Z Image Base category :)
It generates okay photos, terrible anything else as the base model stands.
Still, it is twice slower than Flux.2 Dev, which is stunningly weird.
https://old.reddit.com/r/StableDiffusion/comments/1qojw11/zimage_base_vs_zimage_turbo/
to see comparisons to the turbo model.
Why does z image turbo keeps generating the same asiatic and chinese faces despite asking for other enthnic subjects? How to tackle this issue
It's probably overtrained on chinese people. That's a common complaint with the turbo model. You should enter the nationality of the person that you want your subject to look like such as: Swiss, Nigerian, Mexican, etc.
Not havng this issue at all
I don't have this issue
z-image base solves this, way more variety in seeds and looks.
https://www.reddit.com/r/StableDiffusion/comments/1qozyms/a_quick_test_showing_the_image_variety_of_zimage/
@BlackBear31 Thanks
Finally base has been released! I need to do some finetunes asap
Yeah it needs it, the images remind me a bit of Pony V7, but quite a bit better.
It is not training Anime worth a damn.
Is anyone else getting black output when using Sage Attention with Z-Image Base?
that's not a Sage issue, that's a quant issue. Are you using qwen_3b or some albiterated vesion? Sage is just the catalyst. Also, put sage right up against your sampler
I am having the same issue, I am using forge neo. If i try to use the listed text encoder i get some kind of error about cant multiply mat1 and mat 2 or something and so i used the qwen3b and only get black images.
@makoshark1975 its because of text encoder, im using qwen3 4b without any problems on neo
@makoshark1975 are you using a conditioning zero out node or an empty clip text encoder?
Mat1 and mat2 errors are weight or model mismatches. Make sure that you have the dtype weight set to default in your diffusion model loader. Also, make sure you are using Lumina2 as the clip base. Also, make sure CFG is at 4.0 and you are doing at least 25 steps (not the issue or it'd be white, but check anyway)
@DevilLady I am not sure, I select lumina, set model to the new base model, use the same vae i was using and the qwe3 4b encoder, cfg at 4.5 and all i get are solid black images
@lonecatone23Im not sure what you mean by conditioning zero or antying about diffusion model loader or lumina2 as the clip base, i am using forge neo not comfyui
@makoshark1975 ah.
I uninstalled sage and black images went away in forge neo
@lonecatone23 I'm using Qwen_3_4b.safetensors. I usually run my comfy with Sage running globally but I also tested KJ sage patch node while running comfy with pytorch attention and I still get black outputs.
@patrekt when was the last time you updated, or are you on (cringe) desktop?
@lonecatone23 Nah I'm on portable (v0.11.0-1). I updated comfy last night after the release
@patrekt odd. do you have it on auto, or set to a certain attention? I wouldn't be surprised if 2 or 3 are giving it issues (especially 3). Also, a lot of head scratching last night made me realize that you need an actual clip text encoder hooked up to the negative and not a conditioning zero out node.
@lonecatone23 Tried on auto and all the attentions, still getting the same black output. I'm running 3090 so Sage3 and FP8 variants doesn't work for me. and no I'm not using conditioning zero out. really have no clue why it works for some people and not for other ones like me.
@patrekt make a simple workflow, just diffusion model, clip, vae, clip text encodeer, model flow, latent, ksampler, decoder. , then plug it after the aura flow and before the sampler and see if you still have issues
@lonecatone23 still getting black output. maybe it has something to do with my torch or python version. I'm running Python 3.12.10 - Torch 2.8.0 - Cu128
Run it through Grok or Claude and see what it says. If you had Sage attention working before there is no reason why it shouldn't work now.
Do you have issues with other models with Sage Attention?
@lonecatone23 It works completely fine with other models. I also did a post on reddit and seems like other people are also facing the same issue with sage.
@patrekt Yeah, It's an epidemic. Ther are a ton of requests over it
the same problem. I launched comfy without --fast - it helped me
Ostris Toolkit is updated.
We'll need to 're'train our ZIT dataset for this again? :(
@TribalDiffusion The Lora's are unstable. some work, some don't. every one trained on the site is crap. Plus, it does not merge with ANYTHING, even when forced.
Welcome Flux 2 Klein all over again
@lonecatone23 Sigh! I had created a ZIT Lora for faces of the people in native place, which seems to work. I'll test that with this Base mode. If it doesn't work, back it zit :-/
It's quite slow, and it seems to generate much worse images at 30 steps than ZIT does at 8. As far as diversity of imagery I have yet to find out because I'm still rendering. M1 mac is not an optimal AI image generator.
Yeah diversity is way better, less generic people and poses, e.g
https://www.reddit.com/r/StableDiffusion/comments/1qozyms/a_quick_test_showing_the_image_variety_of_zimage/
It is more for training than using directly. It isn't supposed to be more realistic than Z-Turbo today. You're going to need to wait for loras and fine tunes.
Has anyone figured out the all black images problem in forge neo? Turbo works fine, i just switched model to this and changed the steps and cfg and still only get black images. Is there something i need to update in my forge neo venv or something, I am a linux newb
try to disable sage attention, when I had this problem I changed to pytorch and it worked.
@CRAZYAI4U That is what I did, just uninstalled sage from the venv and it seems to be working now. Crazy
@makoshark1975 which text encoder and vae are you using in neo? since the base is fp16, the provided text encoder is fp8 and the error message is that they cannot be mixed. I've tried using stuff from turbo version but the quality is worst than turbo.
IT WORKS WITH NEO BUT IT'S SLOW AND VERY INACCURATE
@Melodic_Possible_582589 I downloaded the VAE and Text encoder and model from https://huggingface.co/Comfy-Org/z_image/tree/main/split_files Seems ok so far
I don't recommend this base model locally, its very slow, especially if you use "normal" gpu:
turbo Q6.gguf | 8 steps = 21.23 seconds
base Q6.gguf | 30 steps = 2.76 minutes !
There are pros to waiting for me, actual pose/looks variety
https://www.reddit.com/r/StableDiffusion/comments/1qozyms/a_quick_test_showing_the_image_variety_of_zimage/
nope, variety and creativity of ZIB outweighs the speed of ZIT. i'm using 3070ti and can manage 1 image every 2 mins.
Quick test showing how the base while slower has much better seed variety, women actually looking different in different poses while turbo tends to stick to the same pose and look. Also much better at following prompts, check out the image in the comments showing this.
https://www.reddit.com/r/StableDiffusion/comments/1qozyms/a_quick_test_showing_the_image_variety_of_zimage/
I’ve already tried making a LoRA for this model (check the post on my profile). The results were expected for a 6B model and weren't particularly impressive. I think for complex concepts, it's better to use fine-tuning and larger datasets than what I used for this LoRA. That said, the model is decent for simpler concepts. Qwen Image 2512 is still the best base for training solid LoRAs. Regarding performance on my 5090: 40 steps take 38 seconds in fp16, so it’s probably better to wait for fp8 or use a turbo version.
work in SD.Next (on dev branch)
Low VRam workflows for base up and running. Still slooow 🐌🐢
https://civitai.com/models/2184844
Base really isnt that bad in terms of quality, you just have to spend some time tweaking the settings. I really cant tell the difference between base quality and Turbos quality.
The only two major differences, the speed of generation, turbo being faster, and prompt coherence and diversity; Base being better.
So if you use base over turbo, you get better prompt adherence, better anatomy, and more diversity/creativity.
If you use turbo over base, you get slightly better image quality, and speed.
I find that if I use any amount of steps below 40 I get these weird blotches or grid patterns in my images, as if the image did not cook enough to converge, 40 steps seems to be the right spot for me as those problems went away at anything beyond 40 steps.
Im using full models, not even fp8 or ggufs and getting 120 second generation times on my 4090 with a basic ZIB workflow. Its not bad for speed, not like QWEN bad. QWEN image absolutely melts my PC and makes it scream for mommy. My ZIT workflow gets around 80 second generation times, because its a complicated workflow. So the speed isn't so painful that I'd abandon it like QWEN. I think ZIB will be my new work horse as I plan on dumping ZIT, hopefully we get some sweet fine tunes in the coming months, cant wait. Why am I dumping ZIT? two problems, extreme lack of diversity/creativity and frequent body horrors aka 6 toes, 6 fingers, mutated limbs. From my experiments so far, ZIB has far less body horrors, you just need to use the proper negatives.
BTW I highly recommend you pair your workflow for ZIB with SeedVR, I know I trashed it earlier with ZIT, but for some reason it seems to work exceptionally well with ZIB. ZIT paired with SeedVR seem to just make the images far too soft, but for some weird reason it doesn't with ZIB. Maybe it was because I was using sage attention, but no longer using it and just using pytorch attention. I heard Sage causes images to shift in quality and style.
One big problem I have noticed with base though is that it takes prompts far too literally. Like if you are lazy and have two prompts inside the positive prompt box, unlike ZIT, ZIT will take them both and combine them into one cohesive prompt, while ZIB will literally try to run both prompts and smash two different images into a single image. I think its because ZIT is only 8 steps so it doesnt have enough time to overthink a prompt, while ZIB is 40 steps and it just overthinks the prompt like crazy.
same blotches/ compression artifacts. It seems absolutely random. The safest way I've avoided them so far is sticking to 1024x1024. I went to turbos 1088x1920 and it just comes up every time, even at 40~70 steps itll do it and start to mutate.
Yeah I have similar thoughts, still gonna keep turbo for some random quick fun, and then use base when I want to have more specific images that listen to my specific prompts, with more creative images that have variety. Might start with low res to get ideas and then go full res when I'm happy.
Has anyone been getting outputs with what looks like jpeg compression artifacts in random places? I can't explain it any other way. I've got everything loaded correctly but its still happening
read me comment next to yours, its the steps i think
Do you use sageattention? Try it without it.
I saw a comparison with sage attention doing that, and the one with it removed looked fine.
@HugMeIntoFace No sage attention for me here. such a mystery
@jj43797771 are you using 25-50 steps? and a CFG around 4?
@J1B Yep, using all the suggested settings. oddly lowing the CFG brings down the occurance of it happening.
Currently using Forge Neo and getting the error "The size of tensor a (1280) must match the size of tensor b (160) at non-singleton dimension 1"
Using the provided VAE and text encoders (both fp4 and fp8 separately) and no loras being used
you nee this TE, not fp8 https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/text_encoders/qwen_3_4b.safetensors
I remember having a similar error while training a lora. It was due to incompatible training settings I think. This should have nothing to do with number precision (fp4, fp8 or fp16). If you are using a recent version of forge it is probably because of some wrong settings or wrong dependencies.
Is this the original basic version? The one that came out recently?
Getting this - Exception Message: Cannot execute because a node is missing the class_type property.: Node ID '#271'
Do turbo loras work with Base?
The more important question is do z-image base loras work on ZIT and can you mix them successfully... I am just testing it now.
But yes ZIT lora do kind of work on base, but you have to set them on strength 3.0+
Guess it depends on what you mean with working. You can use them with base, but may not work as intended. I used one I trained with ZIT here, but at strength 0.1 it doesn't do much. Near full weight weight it looks pretty bad.
@RisingV I obviously mean
1) work as intended
2) look good
@Ses_AI ok, than no, at least for me
may be different for loras trained on realistic style images though
@theally when are you going to add a "Z-Image Base" lora category so we can upload loras in the right place?
I unironically keep reloading the models page for this lmao!
Will hopefully go in today!
Any info how to finetune it?
bear with me, but can anyone do me this favor?
can someone test to see if z-image base respects the tag/prompt "flat chest"?
z-image refused to generate a woman with flat chest or small breasts.
Yo, I tried it and the tag "flat chest" does not seem to work, "small breasts" seems to somewhat work, but the best you can do is using "large breasts" etc. in the negative prompt.
I scheduled an image post with different prompts for this (PG and PG-13 only images):
https://civitai.com/posts/26216103
@RisingV thank you so much!
dammit, this one thing prevents me from training my art using z-image.
it just won't adhere or recognize flat chest and different chest sizes, regardless of the tags or captions. 😒
people laugh at me when I ask this, but I have flat-chested grown up women characters who z-image INSISTS on giving large breasts to, regardless of how I set up the training.
No problem. I guess you have to wait for a finetuned checkpoint then. Or you can maybe try to do text encoder training? Don't know if it will work with civitai trainer though.
Also training with danbooru style captioning seems to work with zimage, but maybe it is better to use natural language? Have not tried that yet.
Well, no love for flatties out there ;)
Something like this will almost certainly be solved by finetunes, but those will take some weeks/months to shake out.
Hang on for a little while longer. Now that Base is out, Z-Image stuff is definitely going to kick into higher gear.
sometimes a nipple-based LoRA can fix this automatically since it's giving more attention to a specific detail than making sure there is always big booba
super random but has someone tried to do embeddings for this yet? does it even allow embeddings? 🤔
I've tried it over and over, but the results are pretty bad. The fingers and feet aren't rendering well at all. Honestly, it’s worse than Turbo and way behind Qwen Image.
Are you using ComfyUI with Sage Attention by any chance? If you are, don't use Sage Attention. It's a well know problem that Z-Image base introduces image artifacts with Sage Attention active, and sometimes it doesn't even work at all (gives black image).
@mmdd2543 Thank you for telling me that. I'm heading off to try again!
D@ng, Z-Image base is good! Much better than Z-Image Turbo in my tests (apart from being slower ofc). BTW, try out the follwing sampler/scheduler combo: sa_solver_pece/beta . You might be pleasantly surprised. I really like the image quality it provides. Very crisp detail and textures.
I'm just putting this solution here. Got weird image artifacts? Turn off use sage attention in your launch parameters. completely removes it.
Bad fix and it shouldnt be a problem with the model but here we are
I'm having this problem, even though i never installed Sage Attention. I even reinstalled the latest Comfyui portable and i still have this problem
Yep. Took me forever to figure it out, almost gave up. Felt like a dumbass when I realized how simple the solution was.
Can we expect a "DMD" style version of this for lower steps?
you mean zimage turbo?
Ostris is working on it
If you want a distilled low-step model, that’s literally what Z-Image Turbo is.
This release is specifically for everything that Turbo cannot do — LoRA, diversity, stylistic control.
Asking for a DMD version of the base model is like asking the devs to remove 80% of its capabilities again 😄.
The turbo version is based on DMD. So, hopefully a DMD LoRA, which can be stacked on any finetuned non-distilled model.
Turbo is an 8 step model. It's going to result in a more realistic and higher quality image that is far less malleable.
@aising23 The student model parameters of DMD trainning will have catastrophic forgetting of the teacher model parameters。So, unless you do brand new CFG & steps distillation training or undergo a Full Rank LoRA to refine Zimage(non turbo) revert back to ZiT ; (
@whateverr Yes, a Full Rank LoRA can change Zimage (non turbo) back to ZiT, but...
It is not possible to make this LoRA part effective by modifying the weights. Currently, the only acceleration effect is Z-Image-Turbo-DistillPatch, but the acceleration effect is limited due to the inability to fully align the ZI "Base". We are trying to communicate with the Lightx2v team and HyperSD team, and perhaps we can try2 training a distillation adapter?
Training loras with the base model and then using them with the turbo model works great. It enhances the turbo model’s quality, eliminating those weird artifacts when a lora is applied.
Here's an example:
Both trained using the same data, same training config.
Trained with the distilled turbo model:
https://civitai.com/images/117782541
Trained with the base model:
https://civitai.com/images/119239695
The second image looks almost exactly like my training data—the color and angle make it seem more like a movie scene than a photo.
Did you use the same lora weight for generation?
I observed that I have to double the weight with loras trained on Base and using it with ZIT, compared to a lora trained on ZIT, to get the same impact.
I tried training with this but got an error
Why do not you just use lora you have trained on Base, with the Base itself?
@RisingV No, a LoRA trained with the base model doesn’t need any weight adjustments when using ZIT. I discovered that training beyond 2,399 steps doesn’t require any weight change when using the turbo model. Fewer than 2000 steps will require weight change.
@stewi0001 What app are you using? I use onetrainer.
@Wingo Its slower, and I don't get good results from the base model. Its just a good model to train loras on.
@Mayer2003 how do you get good result from baseZimg Lora on Zturbo.
I trained on Aitoolkit with 3k step Linear LR0.0002 (Zbase).
when i try on Zturbo look like my Lora not working. face is not look like dataset.
@Mayer2003 I was trying a workflow i have I comfyui. It was saying that the model was missing parts needed for training. I might have been using the wrong node. I will give onetrainer a try. Thank you
@pathAi Try LR 0.0003 or 0.0005. Also try a higher lora rank like 64/64, 64/128, 128/128.
@Mayer2003 tried onetrainer and still getting errors. Any chance you can share your preset?
@stewi0001
{ "__version": 10, "training_method": "LORA", "model_type": "Z_IMAGE", "debug_mode": false, "debug_dir": "debug", "workspace_dir": "workspace/run", "cache_dir": "workspace-cache/run", "tensorboard": false, "tensorboard_expose": false, "tensorboard_always_on": false, "tensorboard_port": 6006, "validation": false, "validate_after": 1, "validate_after_unit": "EPOCH", "continue_last_backup": false, "include_train_config": "NONE", "multi_gpu": false, "device_indexes": "", "gradient_reduce_precision": "FLOAT_32_STOCHASTIC", "fused_gradient_reduce": true, "async_gradient_reduce": true, "async_gradient_reduce_buffer": 100, "base_model_name": "Tongyi-MAI/Z-Image", "output_dtype": "BFLOAT_16", "output_model_format": "SAFETENSORS", "output_model_destination": "models/base.safetensors", "gradient_checkpointing": "ON", "enable_async_offloading": true, "enable_activation_offloading": true, "layer_offload_fraction": 0.0, "force_circular_padding": false, "compile": false, "concept_file_name": "training_concepts/linux.json", "concepts": null, "aspect_ratio_bucketing": true, "latent_caching": true, "clear_cache_before_training": true, "learning_rate_scheduler": "CONSTANT", "custom_learning_rate_scheduler": null, "scheduler_params": [], "learning_rate": 0.0003, "learning_rate_warmup_steps": 200.0, "learning_rate_cycles": 1.0, "learning_rate_min_factor": 0.0, "epochs": 100, "batch_size": 2, "gradient_accumulation_steps": 1, "ema": "OFF", "ema_decay": 0.999, "ema_update_step_interval": 5, "dataloader_threads": 1, "train_device": "cuda", "temp_device": "cpu", "train_dtype": "BFLOAT_16", "fallback_train_dtype": "BFLOAT_16", "enable_autocast_cache": true, "only_cache": false, "resolution": "512", "frames": "25", "mse_strength": 1.0, "mae_strength": 0.0, "log_cosh_strength": 0.0, "huber_strength": 0.0, "huber_delta": 1.0, "vb_loss_strength": 1.0, "loss_weight_fn": "CONSTANT", "loss_weight_strength": 5.0, "dropout_probability": 0.0, "loss_scaler": "NONE", "learning_rate_scaler": "NONE", "clip_grad_norm": 1.0, "offset_noise_weight": 0.0, "generalized_offset_noise": false, "perturbation_noise_weight": 0.0, "rescale_noise_scheduler_to_zero_terminal_snr": false, "force_v_prediction": false, "force_epsilon_prediction": false, "min_noising_strength": 0.0, "max_noising_strength": 1.0, "timestep_distribution": "LOGIT_NORMAL", "noising_weight": 0.0, "noising_bias": 0.0, "timestep_shift": 1.0, "dynamic_timestep_shifting": false, "unet": { "__version": 0, "model_name": "", "include": true, "train": true, "stop_training_after": 0, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_32", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "prior": { "__version": 0, "model_name": "", "include": true, "train": true, "stop_training_after": 0, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_32", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "transformer": { "__version": 0, "model_name": "https://huggingface.co/Comfy-Org/z_image/blob/main/split_files/diffusion_models/z_image_bf16.safetensors", "include": true, "train": true, "stop_training_after": 0, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "BFLOAT_16", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "quantization": { "__version": 0, "layer_filter": "layers", "layer_filter_preset": "blocks", "layer_filter_regex": false, "svd_dtype": "NONE", "svd_rank": 16, "cache_dir": "workspace-cache/run/quantization" }, "text_encoder": { "__version": 0, "model_name": "", "include": true, "train": false, "stop_training_after": 30, "stop_training_after_unit": "EPOCH", "learning_rate": null, "weight_dtype": "BFLOAT_16", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "text_encoder_layer_skip": 0, "text_encoder_2": { "__version": 0, "model_name": "", "include": true, "train": true, "stop_training_after": 30, "stop_training_after_unit": "EPOCH", "learning_rate": null, "weight_dtype": "FLOAT_32", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "text_encoder_2_layer_skip": 0, "text_encoder_2_sequence_length": 77, "text_encoder_3": { "__version": 0, "model_name": "", "include": true, "train": true, "stop_training_after": 30, "stop_training_after_unit": "EPOCH", "learning_rate": null, "weight_dtype": "FLOAT_32", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "text_encoder_3_layer_skip": 0, "text_encoder_4": { "__version": 0, "model_name": "", "include": true, "train": true, "stop_training_after": 30, "stop_training_after_unit": "EPOCH", "learning_rate": null, "weight_dtype": "FLOAT_32", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "text_encoder_4_layer_skip": 0, "vae": { "__version": 0, "model_name": "", "include": true, "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_32", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "effnet_encoder": { "__version": 0, "model_name": "", "include": true, "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_32", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "decoder": { "__version": 0, "model_name": "", "include": true, "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_32", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "decoder_text_encoder": { "__version": 0, "model_name": "", "include": true, "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_32", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "decoder_vqgan": { "__version": 0, "model_name": "", "include": true, "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "learning_rate": null, "weight_dtype": "FLOAT_32", "dropout_probability": 0.0, "train_embedding": true, "attention_mask": false, "guidance_scale": 1.0 }, "masked_training": false, "unmasked_probability": 0.8, "unmasked_weight": 0.5, "normalize_masked_area_loss": false, "masked_prior_preservation_weight": 0.0, "custom_conditioning_image": false, "layer_filter": "^(?=.*attention)(?!.*refiner).*,^(?=.*feed_forward)(?!.*refiner).*", "layer_filter_preset": "attn-mlp", "layer_filter_regex": true, "embedding_learning_rate": null, "preserve_embedding_norm": false, "embedding": { "__version": 0, "uuid": "1476e0f1-f64f-45e8-8cf5-c030d4734d25", "model_name": "", "placeholder": "<embedding>", "train": true, "stop_training_after": null, "stop_training_after_unit": "NEVER", "token_count": 1, "initial_embedding_text": "*", "is_output_embedding": false }, "additional_embeddings": [], "embedding_weight_dtype": "FLOAT_32", "cloud": { "__version": 0, "enabled": false, "type": "RUNPOD", "file_sync": "NATIVE_SCP", "create": true, "name": "OneTrainer", "tensorboard_tunnel": true, "sub_type": "", "gpu_type": "", "volume_size": 100, "min_download": 0, "remote_dir": "/workspace", "huggingface_cache_dir": "/workspace/huggingface_cache", "onetrainer_dir": "/workspace/OneTrainer", "install_cmd": "git clone https://github.com/Nerogar/OneTrainer", "install_onetrainer": true, "update_onetrainer": true, "detach_trainer": false, "run_id": "job1", "download_samples": true, "download_output_model": true, "download_saves": true, "download_backups": false, "download_tensorboard": false, "delete_workspace": false, "on_finish": "NONE", "on_error": "NONE", "on_detached_finish": "NONE", "on_detached_error": "NONE" }, "peft_type": "LORA", "lora_model_name": "", "lora_rank": 128, "lora_alpha": 128.0, "lora_decompose": false, "lora_decompose_norm_epsilon": true, "lora_decompose_output_axis": false, "lora_weight_dtype": "FLOAT_32", "bundle_additional_embeddings": true, "oft_block_size": 32, "oft_coft": false, "coft_eps": 0.0001, "oft_block_share": false, "optimizer": { "__version": 0, "optimizer": "ADAM_8BIT", "adam_w_mode": false, "alpha": null, "amsgrad": false, "beta1": 0.9, "beta2": 0.999, "beta3": null, "bias_correction": false, "block_wise": true, "capturable": false, "centered": false, "clip_threshold": null, "d0": null, "d_coef": null, "dampening": null, "decay_rate": null, "decouple": false, "differentiable": false, "eps": 1e-08, "eps2": null, "foreach": false, "fsdp_in_use": false, "fused": false, "fused_back_pass": false, "growth_rate": null, "initial_accumulator_value": null, "initial_accumulator": null, "is_paged": false, "log_every": null, "lr_decay": null, "max_unorm": null, "maximize": false, "min_8bit_size": 4096, "quant_block_size": null, "momentum": null, "nesterov": false, "no_prox": false, "optim_bits": 32, "percentile_clipping": 100, "r": null, "relative_step": false, "safeguard_warmup": false, "scale_parameter": false, "stochastic_rounding": true, "use_bias_correction": false, "use_triton": false, "warmup_init": false, "weight_decay": 0.0, "weight_lr_power": null, "decoupled_decay": false, "fixed_decay": false, "rectify": false, "degenerated_to_sgd": false, "k": null, "xi": null, "n_sma_threshold": null, "ams_bound": false, "adanorm": false, "adam_debias": false, "slice_p": null, "cautious": false, "weight_decay_by_lr": true, "prodigy_steps": null, "use_speed": false, "split_groups": true, "split_groups_mean": true, "factored": true, "factored_fp32": true, "use_stableadamw": true, "use_cautious": false, "use_grams": false, "use_adopt": false, "d_limiter": true, "use_schedulefree": true, "use_orthograd": false, "nnmf_factor": false, "orthogonal_gradient": false, "use_atan2": false, "use_AdEMAMix": false, "beta3_ema": null, "alpha_grad": null, "beta1_warmup": null, "min_beta1": null, "Simplified_AdEMAMix": false, "cautious_mask": false, "grams_moment": false, "kourkoutas_beta": false, "k_warmup_steps": null, "schedulefree_c": null, "ns_steps": null, "MuonWithAuxAdam": false, "muon_hidden_layers": null, "muon_adam_regex": false, "muon_adam_lr": null, "muon_te1_adam_lr": null, "muon_te2_adam_lr": null, "muon_adam_config": null, "rms_rescaling": true, "normuon_variant": false, "beta2_normuon": null, "normuon_eps": null, "low_rank_ortho": false, "ortho_rank": null, "accelerated_ns": false, "cautious_wd": false, "approx_mars": false, "kappa_p": null, "auto_kappa_p": false, "compile": false }, "optimizer_defaults": {}, "sample_definition_file_name": "training_samples/samples.json", "samples": null, "sample_after": 200, "sample_after_unit": "STEP", "sample_skip_first": 0, "sample_image_format": "PNG", "sample_video_format": "MP4", "sample_audio_format": "MP3", "samples_to_tensorboard": true, "non_ema_sampling": true, "backup_after": 200, "backup_after_unit": "STEP", "rolling_backup": false, "rolling_backup_count": 3, "backup_before_save": true, "save_every": 200, "save_every_unit": "STEP", "save_skip_first": 0, "save_filename_prefix": "base_" }@Mayer2003 ok, thanks
may be because I trained on drawings or different settings
Did not know that onetrainer supports it. Does onetrainer support lowvram settings (like quantization and layer offloading)?
EDIT: seems like it does from what I can see from github/deepwiki page.
It gives much worse results than its turbo version.
thats usually how things go with base models. wait for finetunes.
If you use a Z-image-turbo LoRA on Z-image-base, the results are bound to be worse! For good results, you need to train a Z-image-base LoRA and use it with Z-image-base.
This model requires expertise. If you're a noob use ZIT.
how is that? xD
@RPxGoon I meant it's not plug and play. You need to understand how to train loras/lokrs. You also need to put effort in prompting to lock it in a specific look. Negative prompting can no longer be ignored like before etc.
@cobaltpixiv520 在base得不到的,在ZIT也不会得到,反而更糟。相比之下Tubro最大的优势就是速度快和资源占用少
@cobaltpixiv520 ,let me describe my own impressions.
ZIT prompting is like FLUX one: what I prompt is what I get with good results. And personally I expected from ZIB the same quality as ZIT but with some fixes,something like ZIT de-deTurbo. ZIT is a powerful thing with loyal requirements to hardware and output quality comparable with SDXL despite all problems.
ZIB prompting is closer to SDXL one with grown system requirements due usage LLM as a text encoder and 30-50 steps to get good quailty (for vanilla SDXL often 25-30 was enough) and some things availible in ZIT out-of-box are hidden in ZIB beneath more complicated prompting. For people with 8gigs or less it's rather sensitive.
There shouldn't be a difference between ZIT and ZIB more than one between Flux Schnell and regular Flux, but it is. Even loras between ZIB and ZIT are not completely interchangeble.
That's the worst argument I've ever seen about a model. What you're saying is actually bad for the model because if you need to be an expert to set up a model and train images or prompts, then this model is just the worst thing ever.
@Fadoo2077 My point is simple. You want good generation? go with ZIT. You want better generation? put effort in learning ZIB, and use both.
ignore the people with skill issues saying this is worse than the release of ZIT... its crazy good, even in comparision to most ZIT merges ive seen.
It is a lot harder to use than ZIT, you need a good solid negative prompt and more careful prompting, I have struggle for days before reaching out for help from the Reddit /R/StableDiffusion community, some good tips here: https://www.reddit.com/r/StableDiffusion/comments/1qr60ja/how_are_people_getting_good_photorealism_out_of/
Agreed. Don't be lazy with the prompting and your results will improve. Lora use tho.. eeeh. Stick with prompting if you can.
@TheP3NGU1N Maybe the best negative prompt for image quality I have tested so far is:
"mutated, mutation, deformed, elongated, low quality, malformed, alien, patch, dwarf, midget, patch, logo, print, stretched, skewed, painting, illustration, drawing, cartoon, anime, 2d, 3d, video game, deviantart, fanart"
I have also tested a longer on but I am not sure if it is better
"mutated, mutation, deformed, elongated, low quality, malformed, alien, patch, dwarf, midget, patch, logo, print, stretched, skewed, painting, illustration, drawing, cartoon, anime, 2d, 3d, video game, deviantart, fanart,noisy, blurry, soft, deformed, ugly, drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly, bokeh, Deviantart, jpeg , worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art, watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name, blur, blurry, grainy, morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, 3D ,3D Game, 3D Game Scene, 3D Character, bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities, bokeh Deviantart, bokeh, Deviantart, jpeg , worst quality, low quality, normal quality, lowres, low details, oversaturated, undersaturated, overexposed, underexposed, grayscale, bw, bad photo, bad photography, bad art, watermark, signature, text font, username, error, logo, words, letters, digits, autograph, trademark, name, blur, blurry, grainy, morbid, ugly, asymmetrical, mutated malformed, mutilated, poorly lit, bad shadow, draft, cropped, out of frame, cut off, censored, jpeg artifacts, out of focus, glitch, duplicate, airbrushed, cartoon, anime, semi-realistic, cgi, render, blender, digital art, manga, 3D ,3D Game, 3D Game Scene, 3D Character, bad hands, bad anatomy, bad body, bad face, bad teeth, bad arms, bad legs, deformities, bokeh , Deviantart"
These things take so long to test in Z-image!
The standard negative, provided by Tongyi, is
"泛黄,发绿,模糊,低分辨率,低质量图像,扭曲的肢体,诡异的外观,丑陋,AI感,噪点,网格感,JPEG压缩条纹,异常的肢体,水印,乱码,意义不明的字符"
You can translate it to english, but it's worse compared to the chinese results.
Tongyi's defaults are euler + simple + 30 steps, whereas comfy is res_multistep.
Did people struggle? I tried running prompts that ZIT couldn't quite get right, and Base was nailing and looked better. Maybe users prefer the more efficient, snappier tag format.
@TheP3NGU1N What's the issue with Loras? Does Z-Image just not inherently support it like SD/Lumina?
Z-architecture requires natural language, not tag-based like 1-3 words, then comma and repeat, or how prompting works for danbooru-based data.
There are extensions that can do this for you
https://github.com/KohakuBlueleaf/z-tipo-extension
Will take you a while to get used to, but then it's time to jam.
@Fatbuns Ah thanks, that explains why my images were coming out with a yellow tint before as the first part of that default Negative translates as "Yellowed/discoloured, greenish tint,"
now it seems mostly fixed , great find!
@J1B If you're on Forge, try to reduce "Shift" parameter (a tip from somewhere from Forge-Neo github). Forge's default 6 , I reduced to 4 and it started working.
Definetely work: euler, euler_a, DPM2, res_multistep; scheduler: simple.
Here is my post on FP8 with shift=4.
My settings are Res_multistep + beta, shift 4, 30 steps, cfg 4
@mphobbit I use ComfyUI, but perhaps the ModelSamplingAuraFlow node in ComfyUI is the equivalent shift value, I will have to test that out and see if it is compatible with Z-Image. It never really changed much in Qwen-Image models for me, so I stopped using it.
Perhaps the ModelSamplingAuraFlow node in ComfyUI is the equivalent shift value
Just checked Comfy. Yes, you're right, that's a shift value. It's 3.0 by default, has to work.
> It never really changed much in Qwen-Image models for me, so I stopped using it.
Qwen uses its own architecture, Z-image based on Lumina according to the model card.
After using it for several days, i can certainly say that the model beats even Chroma when it comes to stylistic variety. I'd say it's an overkill if you purely want realism, turbo is more than enough for you, but if you like experimenting with art styles, it's a godsend.
It is great to work as a base for a 2nd ZIT refiner pass as it gives so much more varited and interesting faces and poses.
I have a ComfyUI workflow posted here: https://civitai.com/models/2231351?modelVersionId=2644538
@J1B +1👍
The fp32 version leaked and is on huggingface. Everyone should train on it instead especially if finetuning. Any finetune runs going on bf16 should be stopped and switch to that with full precision on, if you can. Use --fp32-unet launch args to access the full precision when using it. Extremely slow but outputs are night and day against bf16. They should've released it if they wanted to claim tunability, the bf16 version clearly has issues with that.
I also recommend using a combination with UltraFlux VAE.
Seems slightly worse image quality in my testing: https://civitai.com/posts/26326998
about the same quality as the fp8 version, but that is just 1 image, but that took over 20 mins to make on my 3090.
@J1B If you use your mega negative, try something shorter. In my tests, the result is an order of magnitude higher than the BF 16 version. Also, the pure model gives better results than using the LORas that are currently available. I also use a standard loader. In terms of speed, it doesn't take much longer to generate than BF 16.
@lizirgin2012227 idk about faster. It depends on your overhead. I was doing 1280x1600 and with the bigger model my overhead is way down shrank and those can take 2-4 minutes, for an image lol. In terms of quality though it's insane. All those weird hands, weird stuff in the background, random small errors that would happen all go away with the full fp32 model.
@tenstrip I don't know I have been using that fp32 version most of the day and still seeing lots of broken hands, the only thing that fixes them is a 2nd pass with a ZIT model, it is just so much better at that.
Very powerful and versatile model that is a huge breakthrough in image generation.
Small tip
1. When in doubt write longer
2. If still bad, check if your prompt is contradicting and adjust.
3. Finally if there is something you really don't like put it in negative prompt
Any recommended workflow, or the turbo works? Can't seem to drag the image into my workflow to copy what others have used. It was a nice functionality.
Z-Image Base: "Teacher" (FP32) vs. Official (BF16). Why you should keep the leaked file.
There is a lot of confusion right now regarding the massive FP32 version of Z-Image Base floating around versus the official BF16 release.
I have been testing both versions extensively on an RTX 5090, pushing them to their limits without VRAM constraints. The difference is real, and it’s not just about file size. Here is the technical breakdown of why the "Teacher" version is superior and why you might want to keep it.
1. The "Teacher" vs. "Student" Difference
The official release is a Distilled (Student) model. The leaked FP32 version is the Teacher model.
The Teacher (FP32): This is the raw "source" brain. It contains the full probability distribution and the original, uncompressed weights.
The Student (Official): This version was taught to mimic the Teacher but optimized for speed and size. In this process, the model learns to "average out" results to play it safe.
The Reality: The Student version effectively underwent a "lobotomy." It mimics the style but loses the subtle nuances, micro-textures, and the "creative noise" that makes generation look organic.
2. The "Accumulated Error" Problem
Generative AI relies on 30–50 steps of complex math.
In BF16 (Official): Tiny rounding errors happen at every single step. By step 50, these errors accumulate, smoothing out high-frequency details like skin pores, asphalt texture, or fabric weave.
In FP32 (Teacher): The math remains precise until the very end. The result is a "denser," sharper image with better texture fidelity.
3. Censorship & Alignment
This is a big one. The official release likely went through a stronger alignment phase (RLHF) to make it "safe" for the public. The Teacher model represents the weights before this aggressive filtering.
My testing shows: The FP32 version is much more obedient to prompts and significantly less resistant to complex or anatomy-focused concepts (NSFW/Artistic Nudity) compared to the official release.
4. The Holy Grail for Training (LoRA)
If you plan to train LoRAs, the FP32 Teacher model is non-negotiable. Training on the distilled/quantized version is like making a photocopy of a photocopy. You introduce artifacts into your training data. For the best quality LoRAs, you must train against the FP32 Base.
🏁 Summary: Which one should you use?
❌ Stick to the Official (BF16) / GGUF if:
You have <24GB VRAM.
You prioritize generation speed over pixel-perfect fidelity.
You are a casual user just prompting for fun.
✅ Switch to the Teacher (FP32) if:
You have an RTX 3090 / 4090 / 5090 (24GB+).
You are training LoRAs (Critical!).
You want the raw, unfiltered output with maximum texture detail.
You hate the "plastic/smooth" look of distilled models.
So, I have tested the FP32 next to BF16.
On my peasant 4070ti 12gb vram (2.3gb ram used, 11.3/12.0 gb vram used), the speed is the exact same on the latest comfy according to the console (2.45s on new prompt, 2.3s on reused prompt).
In terms of quality, it may be slightly better on creative content, but not on realistic content.
HOWEVER.
NSFW is still botched. I get drastically more limbs than I used to, with the same prompts.
It does also seem to be closer to the same seeds like turbo is, which I find suspicious and strange.
I'm not using the CacheDiT extension to speed up my creations.
@Fatbuns if you're getting same rendering speed; that tells me everything I need to know about your test lol.
And you just wouldn't care to elaborate, nope. Waste of your time I bet.
120% AI generated bs, and from a terrible AI too
DOwnloaded the leaked fp32 base version and oh man is it great. Way slower but the detail and prompt adherence is insane.
Would you care to do elaborate with side-by-side seeds with the same settings & providing the metadata to them?
It is not any slower in my testing. apart from the first load.
@J1B thats because you're not running it in fp32 which defeats the whole purpose of using it. But I still find QWEN to be better and way faster in fp8
@delta45424155 No, I always run it default weights, it is probably because I have 24GB of Vram (or 32GB if I using it on Runpod) so it doesn't have to offload to RAM.
I never use Qwen-Image in fp-8 the lora compatibility is way worse.
@J1B lol you have to use runpod to achieve 32gb VRAM.
@robotrobotbeepboop386 "As of early 2026, approximately 0.37% of Steam users own an NVIDIA GeForce RTX 5090."
@J1B if the fp32 renders at same speed as whichever one you're testing; it is not using fp32 weight... Even on my new 5090 the fp32 version takes roughly 3x longer vs this model. The only advantage I have found tho, with the fp32, is the detail you can go into with your prompt and not have the output messed up or not follow the prompting.
@delta45424155 I just tested it again, and FP32 is apparently running at the same speed as the FP16 (doesn't really make sense). I'm not sure why, I am not selecting quantisation on the model loader and it is the FP32 model as it is 22.9GB.
I know how to load models at different precisions , I have published over 50 models.
could be a bug in ComfyUI or my workflow I guess, but the outputs look differnet so I doubt it is just running it at fp16: https://civitai.com/images/119697733
Could be a fake of broken model as it was never officially releases just "leaked", although I don't really buy that story, something dodge is going on with the people that have been posting it here with brand new accounts with zero posts.
@J1B when you launch comfy; use the flag that forces fp32 when testing. But you'll also want to build your text encoder yourself from shards. I do not recommend it unless you have 64gigs+ of ram and at least a 5090. You'll notice the results with a heavy prompt, but imo it isn't worth it. Especially since loras will be trained and tested on what works for way lesser hardware.
Not related to the z-image, but something suspicious.
We have 3 people in discussion saying that we should download the fp32 version.
They don't provide the evidence. 2 of them are random accounts with numbers.
Relax with the conspiracy theories. :) Not everyone treats Civitai as a social network. Some of us just use the tools.
I tested the FP32 version on high-end hardware because I need the best possible source for training, not for 'hype'. If providing a technical breakdown of why 32-bit float retains more detail than 16-bit quantized makes me 'suspicious', then so be it. Use whatever works for your workflow.
@lizirgin2012227 Why not share your testing then? I shared my testing and it looks worse than bf16: https://civitai.com/images/119697733
@J1B To be honest, I’m already quite busy managing my current social channels for my fanbase, so I can barely keep up as it is. If I find the time and motivation, I might make a comparison post later.
I admit I’m a bit surprised by the divide regarding the FP32 version—it’s strange that some see a difference while others don’t. Like everyone else, I’m confused about this leak, but for me personally, the difference is obvious.
"To be honest, I’m already quite busy managing my current social channels for my fanbase, so I can barely keep up as it is. If I find the time and motivation, I might make a comparison post later.
I admit I’m a bit surprised by the divide regarding the FP32 version—it’s strange that some see a difference while others don’t. Like everyone else, I’m confused about this leak, but for me personally, the difference is obvious."
Another em-dash..
@J1B What startup flags did you use; if using comfyui?
@delta45424155 As expected, you don't care about elaborating topkek. You are asking for the evidence that you don't provide yourself. Hypocrite.
@delta45424155 I have one .bat file with --UseSageAttention and one without, thats about it. But I have 24GB of Vram.
those are just bs, afaict, the so called fp32, saved by diffusers v0.36, is just a older version of final released model which is saved by newer mainline diffusers (not released so tagged as v0.37.dev). Seems z-image team decided to postpone the release at the final moment for a fine-tuning.
And, if those bs accounts can prove fp32 is different drastically than bf16. don't be shy, share the evidence, it gonna shake the foundation of the entire AI world.
those accounts just want people to run an old z-image model in fp32 which requires 24gb vram and is extremely slow, for no reason. Even with AI generated bs. sus
Have the bad artifact issues fixed with ComfyUI Desktop version used. More details: https://civitai.com/articles/25717/z-image-base-artifacts-fixed-comfyui-desktop-solved-my-random-color-block-issues. Hopefully, it can be helpful.
Wait no longer, Rejoice! An 8 Step Turbo Lora is released: https://civitai.com/models/2362961?modelVersionId=2657506
If you are using ZIB as a pre ZIT stage for extra variability this allows you to only run as little as 1-2 Steps of ZIB!
👍😁
The main problem with this 8 step lora is it is killing the image variation in Z-image which was the main reason I have been using it. I am trying to make a ComfyUI workflow that brings it back.
@J1B +1 but good for photorealism.
Why are my images coming out like wet paper??
ComfyUI: 30 steps, Eular - ddim_uniform, cfg 3.5
Edit: My problem was 100% solved by removing the --enable--sage--attention line from cli. A special thanks to @Protagonist_NL
Try Euler - simple.
Are you using a good Negative prompt?
This model needs a negative even more that a positive!
remove the --enable--sage--attention line from cli, also don't use sage attention node
Simple/beta/sgm_uniform schedulers typically work.
I'll make the adjustments and try using it again. Thanks to everyone for the tips!
Well I found the reason so many hate on the fp32 version, but they deleted their responses. They only use --sage attention or worse... --fast fp16_accumulation.... And then they wonder why the results are worse and claim to have ran a valid comparison test. Probably didn't even bother to use a proper heavy prompt to let the fp32 version shine.
nice story,
first, sage attention + z-image is unusable, no body is using it.
Second, z-image is a bf16 model. fp16_accumulation has nothing to do with it.
I'm really curious, why did you do this? misleading people that devs is hiding a "better version" and the model requires 24gb vram to run and is extremely slow?
Z-Base is pretty bad IMO at least when using the on site generation for me.
Remixed several images which were fantstastic with ZIT (also added a negative prompt) and they look awful..
Is this normal? Do I have to change the prompt drastically to get good results with a prompt that worked good for ZIT?
You have to remember that it's a Base Model. It's primary function is for training, fine-tuning, and LoRA development, not inference. If you look back at "Base Models from History", most of them have been pretty rocky at launch, and only through community effort have become fantastic derivative resources.
@theally I hope this is true.
I only wonder how good ZIT is compared to it's base. Okay we don't know what the makers actually used to make ZIT from but I wonder if it really was this base model or sth else.
Some of the quality loss comes down to image resolution as well, With the on site generator Civitai are pretty mean with their max resolution choices (Apart from PonyV7 that gets special treatment)
Here is an image made locally at 1280x1568:https://civitai.com/images/120083106
and the same prompt on the generator at 832x1216:https://civitai.com/images/120070564
It is not a huge difference in this example but the image does feel less pixelated and more fine detailed at the higher resolution.
"Proper" prompting does help. With that said, you will need the help of an LLM for that.
For instance, you can try and throw a random image into one and have it describe the image in a detailed natural language format. Depending on its answer, you will get 6 to 16 lines. You can start out with the full prompt, then shrink it down depending on what you want out of it - if it's a good prompt :P
@Fatbuns I will try that, thanks
Check your settings on local machine .
If you're forge user, first, reduce Lumina shift parameter. I got into trouble when kept default value of 6. I reduced to 4 and ZIB started working. In comfy ModelSamplingAuraFlow node is responsible for shift. You have to unlock it if you use default ZIT workflow and do nothing (it's already set to value of 3).
Try:
Sampler: Euler/Euler_a/DPM2/res_multistep
Scheduler: beta/simple/sgm_uniform
Cfg: 3.5-4.5
Negatives: very long list, use your typical load from SDXL.
Here are some my outputs with FP8 to give you ideas.
Online ZIB is rather poor (I tried on TA)
On CivAI I find a cfg one 1 is pretty much mandatory if that helps.
Is there any official tutorial on how to do a full finetune of ZIB, not just LoRA? Also if I do a full finetune, can I still use the 8 step turbo LoRA?
This is probably super obvious, but I just discovered I've been using the (full?) text encoder from the Z-Image Turbo, the "qwen_3_4b.safetensor", 8 GB file from https://civitai.com/models/2168935?modelVersionId=2442540
I'm on a 16 GB GPU, works great, but I'm going to try some of these smaller ones too now.
Any updates on Z-Image-Edit?
Anything? even just a little?
(scratches compulsively)
Amazing
I'm using a RTX 2080ti 22GB (modified card), why is my VRAM usage above 22GB? Is it because the RTX 20 series doesn't support BF16?
forge neo encoder settings? stopped generating for a long time and catching up to the new models and encoders
nice




