Kirazuri (Anima)
Kirazuri (Anima) 3.0 is a full fine-tune of the Anima Base v1.0 model by CircleStone Labs focused on several goals:
Learn new concepts/styles/characters past the base model dataset cutoff of 2025 September
Enhance the model aesthetic guided by manually applied quality, aesthetic, and style tagging
Improve rendering and understanding of fine-details through high-resolution training for 1024^24, 1280^2, and 1536^2 resolutions
Version 3.0 (Latest)
For in-depth details of version 3.0 training and tooling, see: Kirazuri (Anima) 3.0 Training Diary
Training Details Summary
Trainer: diffusion-pipe commit b0aa4f1e03169f3280c8518d37570a448420f8be
Training device: NVIDIA RTX PRO 6000 Blackwell Max-Q Workstation Edition
Total training time: ~10 days
Total samples seen(unbatched steps): ~2,550,000
Training resolutions:
512^2
768^2
1024^2
1280^2
1536^2
Stage 1
Samples seen(unbatched steps): ~2,000,000
Training time: ~125 hrs
Learning Rate: 6e-6
Learning Rate Scheduler: Cosine
LLM Adaptor Learning Rate: 8e-7
Precision: Mixed BF16
Optimizer: AdamW8bit with Kahan Summation
Weight Decay: 0.01
Timestep Sampling Strategy: Logit-Normal
Stage 2
Samples seen(unbatched steps): ~550,000
Training time: ~118 hrs
Learning Rate: 3e-6
Learning Rate Scheduler: Cosine
LLM Adaptor Learning Rate: 0
Flux Shift: Enabled
Multi-Scale Loss Weight: 0.5
Precision: Mixed BF16
Optimizer: AdamW8bit with Kahan Summation
Weight Decay: 0.01
Timestep Sampling Strategy: Logit-Normal
Additional Features
Tag Dropout: 30% with protected first 8 tags
Tag Shuffle: Applied to last unprotected tags
Natural Language: Short and Long Caption variants
Changes from Kirazuri (Anima) v2.0
Dataset includes recently curated 7,071 images increasing total size from 35,537 to 42,608 images
Dataset cutoff now of 2026/05/12.
Trained at 5 total resolutions in two-stage training
Stage 1 - 512^2, 768^2, 1024^2
Stage 2 - 1024^2, 1280^2 1536^2
Introduced cosine learning rate scheduler for smooth learning rate transition between training stages
Re-captioned full dataset for a second natural language captions variant with updated captioning script
Installing and running
Workflow:

Reference the anima preview base instructions. The model is natively supported in ComfyUI. The above image contains a workflow; you can open it in ComfyUI or drag-and-drop to get the workflow.
Note: Most preview images on the model card additionally use the custom comfyui-prompt-control node for schedule prompting syntax to mix concepts i.e. [word1|word2]
This custom node is entirely optional but required to exactly recreate the outputs in ComfyUI.
The model files go in their respective folders inside your model directory:
anima-kirazuri-v3.safetensors (this model) goes in
ComfyUI/models/diffusion_modelsqwen_3_06b_base.safetensors goes in
ComfyUI/models/text_encodersqwen_image_vae.safetensors goes in
ComfyUI/models/vae(this is the Qwen-Image VAE, you might already have it)
Generation Settings
Trained in mixed resolutions for the majority of training, and finished with dedicated high resolution training.
Previews are generated mostly at 1280^2 e.g. 1520x1040 or 1536^2 e.g. 1248x1824 resolutions.
30-50 steps, CFG 4-5.
Same samplers as recommended for the base model work, I like to use:
er_sde: the recommended default for 30-50 steps.
sa_solver_pece: can converge with good detail in 15-20 steps.
Prompting
Like the base model, this model is trained on booru-style tags, natural language captions, and combinations of tags and captions.
Tag order
[quality/meta/safety tags] [character] [series] [artist] [1girl/1boy/1other etc] [general tags]
Mostly the same order as the base model, only the [1girl/1boy/other etc] groups position is towards the end in this models dataset.
[quality/meta/safety tags] [character] [series] [artist] tag groups are also not shuffled, so their order may have some influence on generations.
Quality and Aesthetic tags
Human score based: masterpiece, best quality, very aesthetic, aesthetic
The very aesthetic and aesthetic tags are where this model diverges from the base, with the intent these can be used to guide the model toward a different aesthetic - a kind of house model bias.
Meta tags
absurdres, official art, etc
Styles
painterly, chiaroscuro, ligne claire, flat color, no lineart, blending, etc
traditional media, oil painting \(medium\), watercolor \(medium\), etc
[Optional] ComfyUI-Autocomplete-Plus prompt input assistance
An optional file danbooru_tags_kirazuri_3.txt is included with the version 3.0 model details.
This file contains metadata that is derived from public sources for prompt assistance only, and is intended to be used with the ComfyUI-Autocomplete-Plus extension.
Rename the file to danbooru_tags_kirazuri_3.csv and place it in your ComfyUI/custom_nodes/comfyui-autocomplete-plus/data directory.
Known Limitations & Issues:
Some concept bleeding and instability is noticeable when using short prompts, especially tag-only prompts.
Longer tag strings and natural language prompts describing the image in detail should help with this.
This reflects how the model was trained with a combination of natural language and tags.
Recognitions
Thanks to CircleStone Labs for the Anima Preview base model.
Thanks to tdrussell of CircleStone Labs for the diffusion-pipe trainer.
Thanks to bluvoll for support using their fork of diffusion-pipe.
Thanks to narugo1992 and the deepghs team for open-sourcing various training sets, image processing tools, and models.
License
This model is released under the same license as the base model.
See the base model for details of the CircleStone Labs Non-Commercial License.
Built on NVIDIA Cosmos
Description
Version 1
This is an experimental full finetune of the Anima Preview version 1 base.
Total training dataset of 15,420 images curated with manual human quality and aesthetic ratings from 2025/07/03 to 2026/03/19, the model should have a fairly strong recency bias and be capable of generating many characters/concepts/styles prominent from that time period.
Training Details
Samples seen(unbatched steps): ~700,000
Training time: ~85 hrs
Learning Rate: 5e-6 (General Training) and 2e-6 (Aesthetic)
Text Encoder Learning Rate: 1e-6
Effective Batch size: 24 (General Training) (1 Batch Size) and 32 (Aesthetic) (16x2 Batch Size)
Precision: Mixed BF16
Optimizer: AdamW8bit with Kahan Summation
Weight Decay: 0.01
Timestep Sampling Strategy: Logit-Normal, Shift 3
Tag Dropout: 10%
Uncond Dropout: 10%
Tag Shuffle: True with keep first 8 tags
Additional Features used:
Protected Tags
Mixed Natural Language prompts at ratio:
tags 50%, nl 10%, tags+nl 20%, nl+tags 20%
FAQ
Comments (15)
I liked the noobai Kirazuri so I will def check out this anima version. Could you please upload it to your TensorArt profile?
I love you Motimalu!!!
is it created on preview 1 version or preview 2?
Trained from the preview 1 version
Is there a list of artists that work with this finetune?
Sorry, but there are legal reasons why I don't want to explicitly list that
In your example prompts you use @[arist style1|2|3]. What does that do exactly? Is that something like a dynamic prompt in Forge, or is it something only Comfy's architecture understands?
Hello, this is referred to as prompt scheduling. The feature is present since the early A1111 days and usable in most A1111 derivatives.
For example afaik currently Forge Neo supports it natively and can be used with Anima.
It is not actually natively supported in ComfyUI, my workflows using it require the custom comfyui-prompt-control node.
Model is great and beautiful.
I hope that we get preview 2 version as it is still better at prompt adhersion with longer/more complicated prompts.
Thank you, a new version based off preview 2 is training now
@motimalu preview 3 just got released
@stygianwizard42 looks nice!
Preview 2 based version collapsed before converging on small details • ᴖ •
Going to make some adjustments for Preview 3 training and finetune it instead
关心一下,你对Preview 3的训练尝试进行得怎么样了?可以指望它最近发布吗。
你的 背景lora 已经用上了。现在期待这边的更新。
一切进展顺利,其中也包含了的背景lora之类的概念。
训练的最后阶段正在进行中,敬请期待。(机器翻译)

















