Z-Image Spice (Experimental) - CivArchive (CivitAI Archive)

Z-Image Spice (Experimental) - v1.0

NSFW

Update 2/11/2026

V1 is currently in training with a significantly improved dataset, with hundreds of thousands of realistic, anime and illustrations images with NSFW + SFW samples. This will supersede the quality of this experimental checkpoint substantially, with early results showing success. Please do not do any development or training with this model as it is becoming quickly obsolete with the new release hopefully coming within the next two weeks. Yes - rather than a vague "at some point in future" I am putting myself on the line to give you a commitment 😹

The purpose of this finetune to provide realistic NSFW results that are completely uncensored, which is difficult to achieve in Z-Image-Base. While standing alone as a finetune, it is not 100% there yet on genital anatomy due to the tight dataset and training time, but when combining it with existing community LoRAs for Z-Image-Base, they are producing excellent results better than one or another alone in my experience.

This is very much a proof-of-concept finetune and requires significant enhancements and cleanup of the dataset. A future goal would be to make a NSFW model stand on it's own feet, but I am expecting it would need at least need a 50,000 image dataset so you can avoid over-fitting whilst training significantly on these garbled concepts such as genitalia for deeper, longer training.

For those who are interested, below are all the relevant statistics and configuration for this training:

Number of images: 7458
Number of steps: 46000
GPU: 2x B200
Max VRAM Usage during Training: 85.2GB
Iteration speed: between 1.00s/it - 1.10s/it
Training Suite: DiffSynth-Studio
Total Training Time: 13 hours

My DiffSynth-Studio script for running training (disregard num epochs, I was just going to shut down after a certain number of hours due to budgeting).

accelerate launch --config_file examples/z_image/model_training/full/accelerate_config.yaml examples/z_image/model_training/train.py \
  --dataset_base_path /workspace/data/flatpack \
  --dataset_metadata_path /workspace/data/flatpack.csv \
  --max_pixels 1638400 \
  --dataset_repeat 50 \
  --save_steps 2000 \
  --model_id_with_origin_paths "Tongyi-MAI/Z-Image:transformer/*.safetensors,Tongyi-MAI/Z-Image-Turbo:text_encoder/*.safetensors,Tongyi-MAI/Z-Image-Turbo:vae/diffusion_pytorch_model.safetensors" \
  --learning_rate 1e-5 \
  --num_epochs 32 \
  --remove_prefix_in_ckpt "pipe.dit." \
  --output_path "./models/train/Z-Image_full" \
  --trainable_models "dit" \
  --use_gradient_checkpointing \
  --weight_decay 0.01 \
  --dataset_num_workers 8

I have also bundled various helper scripts I made to help with preparing the dataset and importantly, for fixing the model post-training to work within ComfyUI and other inference tools. You can check them out here:

https://github.com/zetaneko/Z-Image-Training-Handy-Pack

Description

FAQ

Comments (20)

Kierkegaard420Jan 30, 2026· 9 reactions

CivitAI

A strategy to consider would be using multiple caption dataset. The base model was trained on word tags, short and long text caption descriptions. I'm not sure if the software you use supports it but OneTrainer does because that's how I trained my finetune. The convergence happens more quickly and helps the model understand concepts more broadly. Their research paper talks about this in detail.

It would probably be a good idea to use different LLMs for the short and long natural language captions so that the style of the caption is different.

Awesome job on the finetune!

xenexia

Author

Jan 31, 2026

Thanks for the tips! Yes I am planning to go away and really develop a proper dataset for this, so I'll take these suggestions on. This is a quick and dirty test, but I want to try and get to a stage where this is like Pony V6 but for Z-Image. NovelAI especially for anime has been reigning supreme since 4.5 and it's long overdue for some competition. So I am juggling whether to do an anime fine-tune or realistic going forward. Then again, Z-Image feels so much more robust maybe it can go in both directions with the same fine-tune, then help to make realistic versions of anime characters it learnt from the anime dataset etc.

Kierkegaard420Jan 31, 2026

@xenexia Yeah, the flexibility with Z-Image is awesome. This model has a lot of potential to replace SDXL. It's really trainable but may need a big dataset to address the NSFW issue that you mentioned.

xenexia

Author

Jan 31, 2026· 2 reactions

@Kierkegaard420 yeah exactly. Right now Z Image Base easily gives you body horror, this model reduces that somewhat, and then LoRAs on top can make it a lot better, but doesn't compete yet with uncensored SDXL models like Cyber realistic Pony etc

xenexia

Author

Feb 11, 2026

@Kierkegaard420 Update for you - I'm trying this out, and I am now training on a 250,000 image dataset and early checkpoints are showing amazing results - hopefully I'll release it in a few weeks. I re-wrote DiffSynth-Studio almost entirely in the training framework to support consumer-grade hardware, same current system requirements need for LoRAs and my training is only 4x slower on a single 5070ti I own than it took on the two B200s I rented at $5 an hr. No quality compromise. Thank Claude for it's hard work implementing my radical theory and making it work on layer group chunking. And don't take my word for it I have open sourced that on my Z-Image Training Handy Pack linked in the model description. I'm planning to refactor it and make it into a proper suite with more streamlined tools soon, and support other models with the same technique

UnicomJan 31, 2026

CivitAI

how to make this checkpoint turbo again?

xenexia

Author

Jan 31, 2026· 2 reactions

I'm not sure at this point. I haven't found any public scripts available for Z-Image distillation. It would be good if we had this though so we can distill to make a custom turbo checkpoint. We'll have to wait and see what developments come out

roberto_baggioFeb 3, 2026· 2 reactions

@xenexia @Unicom
Hi! I've made these LoRAs as a bit of an experiment to speed-up your finetuned checkpoint. It’s been performing quite well in my early tests, so give it a try :)
https://huggingface.co/Vinzou/Z-Image-distilled-Loras-Test

xenexia

Author

Feb 3, 2026

@roberto_baggio Awesome work! I'm keen to check this out. The results look promising and most general finetunes would work well with the LoRA based on the diff between non-distilled and distilled. Unless someone did like a million-step finetune changing the weights drastically it should be very portable. Love it!

roberto_baggioFeb 3, 2026· 1 reaction

@xenexia Thanks :)
I'm already obsolete😅; an official version of the distilled LoRA was released today (I haven't tested it yet).
https://huggingface.co/alibaba-pai/Z-Image-Fun-Lora-Distill

jayhartfordFeb 6, 2026· 1 reaction

CivitAI

Thanks for this. I see so much potential here. Hoping you will continue to push on another realism version!

xenexia

Author

Feb 11, 2026· 2 reactions

I'm currently training the first full release on a dataset 43 times bigger, which supports realism, anime screencap style, and illustrations with early results showing success with much less artifacts and better conformity to human genitalia across all formats. Z-Image has really robust capability to support multi-style checkpoints without losing quality between them.

Laugur3Feb 18, 2026· 1 reaction

CivitAI

I will be watching you, with great interests. good luck

xenexia

Author

Feb 18, 2026· 1 reaction

Thanks! Copying my response from another thread: The next version is going under a different name but I'll link it to this model - distancing from Z-Image directly in the name. But in terms of training progress it is taking longer than I thought but well worth it. NSFW imagery is getting crisp in detail and the model is now learning the finer anatomical details. Still 30% images producing some weird body horror like the base model, but getting less and less demented, so I'll wait till this reduces more, and get higher % of correct anatomy. It seems like a lot of training, but I'm training with a slow learning rate so it can still retain all the other capabilities of the model and avoid forgetting and degradation that you see when stacking LoRAs. Will see how it is in one weeks time but if there is still more room for improvement, I'll hold on the release and let it cook a little while longer.

I've also abliterated the text encoder which has improved on the output as well as the refusal layer is nerfed.

Laugur3Feb 20, 2026

@xenexia interesting... my training also got better, but I never touched the texte encoder... I will keep pushing with the classical one and we will see, could be interesting to see thoses two experiences developing at the same time.

for easier details, I juste realised Full Bf16 is not enough for refining, I will try to push a V2 with end of training with Mixed BF16/FP32 precisions.
Also I play a lot with the Flow shift to target various phases of generating Images during the training.
Keep Pushing ! with the 4 steps Lora we got a banger in our hands !

Still rough but soon a banger

Laugur3Feb 20, 2026

@xenexia Something, to think... Qwen3 is used as a text encoder, not an LLM, abliterated or not he can't refuse. it did not got contexte or produce texte, it is just projecting tokens in Latent.

In a certain way Abliterated is Just a Qwen that have been tweaked to not have refusal reflexes, but, it is not used in our case.

On the side not, Maybe the changed structure of the Abliterated model may had impact, but maybe not as we may expect.

kunde2Feb 18, 2026· 1 reaction

CivitAI

Amazing work! Works super well already, can't wait for the update!

xenexia

Author

Feb 18, 2026

Thanks! The next version is going under a different name but I'll link it to this model - distancing from Z-Image directly in the name. But in terms of training progress it is taking longer than I thought but well worth it. NSFW imagery is getting crisp in detail and the model is now learning the finer anatomical details. Still 30% images producing some weird body horror like the base model, but getting less and less demented, so I'll wait till this reduces more, and get higher % of correct anatomy. Will see how it is in one weeks time but if there is still more room for improvement, I'll hold on the release and let it cook a little while longer.

I've also abliterated the text encoder which has improved on the output as well as the refusal layer is nerfed.

kunde2Feb 20, 2026

@xenexia definitely take your time and great news on that text encoder work!

xenexia

Author

Feb 27, 2026

@kunde2 Turns out the text encoder did not meaningfully change anything - it was a placebo effect. This is actually better cause it means it's totally compatible for people to merge/train however they want and reduce disk space. I completely uncensored it. Releasing new model next 24 hrs!

Checkpoint

ZImageBase

by xenexia

Download (Beta) View on CivitAI

nsfw

realistic

style

Details

Downloads

980

Platform

CivitAI

Platform Status

Available

Created

1/30/2026

Updated

5/1/2026

Deleted

Files

zImageSpice_v10.safetensors

Size:

11.46 GB

SHA256:

066dde8e344fd800161fb3b1b240e926df1101416a8f0243a7ba236fcbc11a90

Mirrors

CivitAI (1 mirrors)

zImageSpice_v10.safetensors

Update 2/11/2026

Description

FAQ

What is Z-Image Spice (Experimental)?

How do I use Z-Image Spice (Experimental)?

What should I watch out for with Z-Image models?

What other Z-Image-based models are worth knowing?

Can I use this model commercially?

What files are available and where can I download them?

Comments (20)

Details

Files

zImageSpice_v10.safetensors

Mirrors