CivArchive
    Z-Image Spice (Experimental) - v1.0
    NSFW
    Preview 119323222
    Preview 119318781
    Preview 119319166
    Preview 119319603
    Preview 119321299
    Preview 119321894

    Update 2/11/2026

    V1 is currently in training with a significantly improved dataset, with hundreds of thousands of realistic, anime and illustrations images with NSFW + SFW samples. This will supersede the quality of this experimental checkpoint substantially, with early results showing success. Please do not do any development or training with this model as it is becoming quickly obsolete with the new release hopefully coming within the next two weeks. Yes - rather than a vague "at some point in future" I am putting myself on the line to give you a commitment 馃樄

    The purpose of this finetune to provide realistic NSFW results that are completely uncensored, which is difficult to achieve in Z-Image-Base. While standing alone as a finetune, it is not 100% there yet on genital anatomy due to the tight dataset and training time, but when combining it with existing community LoRAs for Z-Image-Base, they are producing excellent results better than one or another alone in my experience.

    This is very much a proof-of-concept finetune and requires significant enhancements and cleanup of the dataset. A future goal would be to make a NSFW model stand on it's own feet, but I am expecting it would need at least need a 50,000 image dataset so you can avoid over-fitting whilst training significantly on these garbled concepts such as genitalia for deeper, longer training.

    For those who are interested, below are all the relevant statistics and configuration for this training:

    • Number of images: 7458

    • Number of steps: 46000

    • GPU: 2x B200

    • Max VRAM Usage during Training: 85.2GB

    • Iteration speed: between 1.00s/it - 1.10s/it

    • Training Suite: DiffSynth-Studio

    • Total Training Time: 13 hours

    My DiffSynth-Studio script for running training (disregard num epochs, I was just going to shut down after a certain number of hours due to budgeting).

    accelerate launch --config_file examples/z_image/model_training/full/accelerate_config.yaml examples/z_image/model_training/train.py \
      --dataset_base_path /workspace/data/flatpack \
      --dataset_metadata_path /workspace/data/flatpack.csv \
      --max_pixels 1638400 \
      --dataset_repeat 50 \
      --save_steps 2000 \
      --model_id_with_origin_paths "Tongyi-MAI/Z-Image:transformer/*.safetensors,Tongyi-MAI/Z-Image-Turbo:text_encoder/*.safetensors,Tongyi-MAI/Z-Image-Turbo:vae/diffusion_pytorch_model.safetensors" \
      --learning_rate 1e-5 \
      --num_epochs 32 \
      --remove_prefix_in_ckpt "pipe.dit." \
      --output_path "./models/train/Z-Image_full" \
      --trainable_models "dit" \
      --use_gradient_checkpointing \
      --weight_decay 0.01 \
      --dataset_num_workers 8

    I have also bundled various helper scripts I made to help with preparing the dataset and importantly, for fixing the model post-training to work within ComfyUI and other inference tools. You can check them out here:

    https://github.com/zetaneko/Z-Image-Training-Handy-Pack

    Description

    FAQ

    Comments (20)

    Kierkegaard420Jan 30, 20269 reactions
    CivitAI

    A strategy to consider would be using multiple caption dataset. The base model was trained on word tags, short and long text caption descriptions. I'm not sure if the software you use supports it but OneTrainer does because that's how I trained my finetune. The convergence happens more quickly and helps the model understand concepts more broadly. Their research paper talks about this in detail.

    It would probably be a good idea to use different LLMs for the short and long natural language captions so that the style of the caption is different.

    Awesome job on the finetune!

    xenexia
    Author
    Jan 31, 2026

    Thanks for the tips! Yes I am planning to go away and really develop a proper dataset for this, so I'll take these suggestions on. This is a quick and dirty test, but I want to try and get to a stage where this is like Pony V6 but for Z-Image. NovelAI especially for anime has been reigning supreme since 4.5 and it's long overdue for some competition. So I am juggling whether to do an anime fine-tune or realistic going forward. Then again, Z-Image feels so much more robust maybe it can go in both directions with the same fine-tune, then help to make realistic versions of anime characters it learnt from the anime dataset etc.

    Kierkegaard420Jan 31, 2026

    @xenexia聽Yeah, the flexibility with Z-Image is awesome. This model has a lot of potential to replace SDXL. It's really trainable but may need a big dataset to address the NSFW issue that you mentioned.

    xenexia
    Author
    Jan 31, 20262 reactions

    @Kierkegaard420聽yeah exactly. Right now Z Image Base easily gives you body horror, this model reduces that somewhat, and then LoRAs on top can make it a lot better, but doesn't compete yet with uncensored SDXL models like Cyber realistic Pony etc

    xenexia
    Author
    Feb 11, 2026

    @Kierkegaard420 Update for you - I'm trying this out, and I am now training on a 250,000 image dataset and early checkpoints are showing amazing results - hopefully I'll release it in a few weeks. I re-wrote DiffSynth-Studio almost entirely in the training framework to support consumer-grade hardware, same current system requirements need for LoRAs and my training is only 4x slower on a single 5070ti I own than it took on the two B200s I rented at $5 an hr. No quality compromise. Thank Claude for it's hard work implementing my radical theory and making it work on layer group chunking. And don't take my word for it I have open sourced that on my Z-Image Training Handy Pack linked in the model description. I'm planning to refactor it and make it into a proper suite with more streamlined tools soon, and support other models with the same technique

    UnicomJan 31, 2026
    CivitAI

    how to make this checkpoint turbo again?

    xenexia
    Author
    Jan 31, 20262 reactions

    I'm not sure at this point. I haven't found any public scripts available for Z-Image distillation. It would be good if we had this though so we can distill to make a custom turbo checkpoint. We'll have to wait and see what developments come out

    roberto_baggioFeb 3, 20262 reactions

    @xenexia @Unicom
    Hi! I've made these LoRAs as a bit of an experiment to speed-up your finetuned checkpoint. It鈥檚 been performing quite well in my early tests, so give it a try :)
    https://huggingface.co/Vinzou/Z-Image-distilled-Loras-Test

    xenexia
    Author
    Feb 3, 2026

    @roberto_baggio聽Awesome work! I'm keen to check this out. The results look promising and most general finetunes would work well with the LoRA based on the diff between non-distilled and distilled. Unless someone did like a million-step finetune changing the weights drastically it should be very portable. Love it!

    roberto_baggioFeb 3, 20261 reaction

    @xenexia聽Thanks :)
    I'm already obsolete馃槄; an official version of the distilled LoRA was released today (I haven't tested it yet).
    https://huggingface.co/alibaba-pai/Z-Image-Fun-Lora-Distill

    jayhartfordFeb 6, 20261 reaction
    CivitAI

    Thanks for this. I see so much potential here. Hoping you will continue to push on another realism version!

    xenexia
    Author
    Feb 11, 20262 reactions

    I'm currently training the first full release on a dataset 43 times bigger, which supports realism, anime screencap style, and illustrations with early results showing success with much less artifacts and better conformity to human genitalia across all formats. Z-Image has really robust capability to support multi-style checkpoints without losing quality between them.

    Laugur3Feb 18, 20261 reaction
    CivitAI

    I will be watching you, with great interests. good luck

    xenexia
    Author
    Feb 18, 20261 reaction

    Thanks! Copying my response from another thread: The next version is going under a different name but I'll link it to this model - distancing from Z-Image directly in the name. But in terms of training progress it is taking longer than I thought but well worth it. NSFW imagery is getting crisp in detail and the model is now learning the finer anatomical details. Still 30% images producing some weird body horror like the base model, but getting less and less demented, so I'll wait till this reduces more, and get higher % of correct anatomy. It seems like a lot of training, but I'm training with a slow learning rate so it can still retain all the other capabilities of the model and avoid forgetting and degradation that you see when stacking LoRAs. Will see how it is in one weeks time but if there is still more room for improvement, I'll hold on the release and let it cook a little while longer.

    I've also abliterated the text encoder which has improved on the output as well as the refusal layer is nerfed.

    Laugur3Feb 20, 2026

    @xenexia聽 interesting... my training also got better, but I never touched the texte encoder... I will keep pushing with the classical one and we will see, could be interesting to see thoses two experiences developing at the same time.

    for easier details, I juste realised Full Bf16 is not enough for refining, I will try to push a V2 with end of training with Mixed BF16/FP32 precisions.
    Also I play a lot with the Flow shift to target various phases of generating Images during the training.
    Keep Pushing ! with the 4 steps Lora we got a banger in our hands !

    Still rough but soon a banger

    Laugur3Feb 20, 2026

    @xenexia聽Something, to think... Qwen3 is used as a text encoder, not an LLM, abliterated or not he can't refuse. it did not got contexte or produce texte, it is just projecting tokens in Latent.

    In a certain way Abliterated is Just a Qwen that have been tweaked to not have refusal reflexes, but, it is not used in our case.

    On the side not, Maybe the changed structure of the Abliterated model may had impact, but maybe not as we may expect.

    kunde2Feb 18, 20261 reaction
    CivitAI

    Amazing work! Works super well already, can't wait for the update!

    xenexia
    Author
    Feb 18, 2026

    Thanks! The next version is going under a different name but I'll link it to this model - distancing from Z-Image directly in the name. But in terms of training progress it is taking longer than I thought but well worth it. NSFW imagery is getting crisp in detail and the model is now learning the finer anatomical details. Still 30% images producing some weird body horror like the base model, but getting less and less demented, so I'll wait till this reduces more, and get higher % of correct anatomy. Will see how it is in one weeks time but if there is still more room for improvement, I'll hold on the release and let it cook a little while longer.

    I've also abliterated the text encoder which has improved on the output as well as the refusal layer is nerfed.

    kunde2Feb 20, 2026

    @xenexia聽definitely take your time and great news on that text encoder work!

    xenexia
    Author
    Feb 27, 2026

    @kunde2聽Turns out the text encoder did not meaningfully change anything - it was a placebo effect. This is actually better cause it means it's totally compatible for people to merge/train however they want and reduce disk space. I completely uncensored it. Releasing new model next 24 hrs!

    Checkpoint
    ZImageBase

    Details

    Downloads
    980
    Platform
    CivitAI
    Platform Status
    Available
    Created
    1/30/2026
    Updated
    5/1/2026
    Deleted
    -

    Files

    zImageSpice_v10.safetensors

    Mirrors

    CivitAI (1 mirrors)