KLEIN 9B
v1 :
Showcase images were generated with the base model, but I've posted same prompt images with the distilled one which are way better imo. None of the images were upscaled.
ZIB
v1 :
First time training a Z-image Base lora, so still experimenting. Showcase images :
Euler / normal / 20 steps / CFG 5
Hires steps : 20 / NMKD_SIAX @0.2-0.5 / x1.5-x2
768x1152
Lora strength of 1
QWEN
v3 :
After training way more qwen loras, I've tried to apply everything I've learned with this one.
Trigger word is now simply 'ghibli style'
It should be more flexible and closer to the style of the movie. But to do characters of the movie, v2 is better.
v2 :
I've experimented way more with qwen training and this version is way better than v1. The lora is a bit big (600mo) because it's like 18k steps. And I've added a trigger token for the style :
'KikiLaPetiteSorciere style'
To generate all images, I've used Qwen-Image (BF16) + Lightning Lora (4 steps) + my lora with weight of 1.
DDIM with Beta, 4 steps, CFG 1.
For some of them, I've upscaled with my flux lora version of this one, and this workflow.
v1 :
First qwen-image lora i've trained.
I'm still testing and learning
FLUX
First attempt at a Flux Lora (trained on Civitai).
The dataset consists of 28 screencaps (1920 × 1024) from the Ghibli movie Kiki's Delivery Service by Hayao Miyazaki.
I used Florence2 to caption the images, then manually corrected the captions to fix any accuracy errors and to add the trigger phrase:
'a KikiLaPetiteSorciere style image of'
SDXL
Trigger words : KikiLaPetiteSorciere style
For the character Kiki : kiki (majo no takkyuubin), red bow
Trained on 1000 images screencap from the movie Kiki's Delivery Service by Hayao Miyazaki
All images displayed were made with Dreamshaper XL Lightning :
Lora's strength of 1
1216x832
CFG 2
6 steps
DPM++ SDE Karras
Clip skip 2
Only the jedi one was inpainted (she had 3 legs...)
Description
Trigger word is now "ghibli style"
More flexible
FAQ
Comments (3)
V3 is wow! You said that this is the culmination of all the lessons you've learned in training style loras. I know this is a lot to ask, but could you share those lessons? I've been working on a style lora for a specific comic artist, and it hasn't been coming together.
How many images do you use? How many steps? What's the learning rate? What are your training data captions like? Are they detailed? What did you learn about data curation and preparation? What resolution did you train at? Are you using AI Toolkit? Musubi-Tuner? Any other lessons that I'm not even thinking of?
Sorry, again, I know it's a lot to ask, but I would love to learn what you've learned.
So first my settings. I train online on tensor.art (so I think it's khoya_ss under the hood for qwen but I'm not 100% sure) :
Model: Qwen-Image - Full BF16
Image Processing Parameters
Repeat: 20
Epoch: 4
Clip Skip: 2 (use 1 for realism)
Text Encoder learning rate: 0.00001
Unet/DiT learning rate: 0.0006
LR Scheduler: cosine
Warmup steps: something around 5% of the total steps
Optimizer: AdamW8bit
Network Dim: 16 / Network Alpha: 8
conv_dim: 4 / conv_alpha: 1
I know the learning rate seems really high, but I've trained like 10 loras with these exact settings now, and it's always really good.
And for the dataset, most of the time I try to have around 30 images, but for this one it was 58, and it seems to work really fine as well (i've trained with way bigger dataset with less success).
This dataset I've captioned some with Qwen3-VL-4B and other Gemini-2.5-flash (gemini is usualy better). I'm not sure yet what is better between simple or more detailed captions, but both work fine. Here are some examples and the prompt I use : https://imgur.com/a/eDfVBNm
I prefer not to use new trigger words anymore (last version was "KikiLaPetiteSorciere style") because sometimes it randomly adds text to the image with the trigger words, so I use one which already exists and is close to the target style (in this case "ghibli style") but it may be even better to not use any trigger words idk, I've not experimented enough on this point.
And for aspect ratio, I try to have diversity, for this dataset it was : https://i.imgur.com/1esDImR.png
Preparing the dataset is the most important. I try to not have more than 2 or 3 images with the same character, to avoid having the same face for all the humans you will generate with it (and same logic apply to any object). So diversity in images too, with different places, animals, food, people, angle of view, weather etc... The more diverse the lora has been trained on, the more flexible it will be.
@Yofaraway Thank you so much for sharing this. And yeah, I wouldn't have thought to take the learning rate that high. And yeah, starting with Qwen, I've become a big believer in not using triggers for styles specifically. One major reason is because you often need to adjust the strength when mixing styles, and I think it's because the text encoder is now so good, and models so exact, that triggers start really messing up the image by adding things.
There's one trainer on here who adds their name to their loras, like "tbear_sketch", and the images start adding bears to images when you lower the strength, because it no longer has anything specific it's supposed to do with that trigger. It's a shame, because their lora is so good for mixing otherwise, but now I just can't use it.
Anyway, I really appreciate you laying this out. I'm definitely going to use this info. I'll have to shrink my dataset a lot to get down to what you've used. Translating it into Musubi-Tuner and AI-Toolkit settings, and we'll see how it works out. Thanks again, so much.

















