Qwen 3_4_B Trained Text Encoder for Z-Image
FP32
Full Finetune at FP32 (Full Model Finetune - All Parameters & All layers)
FP32 Finetune of QWEN3_4b focusing on describing human features SFW/NSFW captions.
Can be run in FP32 with no time loss on most machines that use CPU offloading.
BF16
Full Finetune at BF16 (20 Layers)
Long Text descriptions 500-1000 token length focusing on describing human features.
For use with Z-Image or Z-Image Turbo
Comparison Images showing QWEN base VS Human Corpus HERE
Description
FAQ
Comments (7)
Did you train this with Z-Image? As part of the diffusion pipeline? Or did you train this separately as a standalone Qwen LM?
It was trained as a an LLM standalone.
I really like what you did here.
There's (almost) no need to use the seed variance node anymore, as it really likes to improve the character composition.
Nudity is also great here, this one has a better comprehension of physical diferences like volume or bust size.
Of course, sexual intercouse don't work here, z-image is not for that; if you wanna try that stick with the original encoder.
Thank you for sharing. ♥♥♥♥
Thank you
Very good on human body details and textures.
But everyone is Asian now, this side effect is too strong, sadly.
It stills useful for inpaint, thought.
👍
Write a more intelligent and longer prompt then! Explain the intent, scene, compulsory image framing, mood, techniques, include reference character profiles, explain what to focus on and why, etc. Try to not micromanage the model with detail. You can also precede it with a "system prompt" addressing the text encoder LLM and forming its attitudes and biases.
I would also suggest to try different models, as the level of image cognition between them is varied significantly.


