I. Introduction
AnimaYume is a text-to-image model fine-tuned from Anima, a high-quality anime-style image generation model developed by CircleStone Labs. It builds upon Cosmos 2, a model developed by NVIDIA’s research team.
II. Information
For version 0.1:
This model is a preview version fine-tuned from the Anima base model using a custom dataset. Training was conducted across multiple resolutions ranging from 768 to 1280 pixels, with a primary focus around 1024. The goal of this release is to improve stability and minimize unwanted artifacts when producing high-resolution images.
Notes: All the example images at this version were generated at the resolution 1024x1536 or 1536x1024
For version 0.2:
This model is a continuation of AnimeYume v0.1. In this version, I improved the quality of my dataset and used several techniques to prevent oversaturation and low-quality outputs. Based on my testing phase, I observed that the prompt coherence is better than v0.1, and the model remains very stable when generating images at a resolution of 1536.
Note: I am still waiting for the final version of Anima and testing some methods to make my training process faster. I know the license might make the model less popular, but I only care about whether the model is good or not. I’m aware that many others use better licenses, but I’m too lazy to spend a bunch of money training a model from scratch.
For version 0.25:
This version was trained on Anima Preview 2. Due to several issues with the base model, such as overfitting, black/white borders, quality inconsistencies, and problems with artist tags, I decided to focus primarily on improving the model’s knowledge, reducing these issues, and making it as stable as possible.
Note: In this version, I did not attempt to improve the model’s style. I tried doing so, but it caused the model to forget some of its existing knowledge. The training process is similar to v0.2, but the dataset has been adjusted to better address the issues present in Anima Preview 2.
For version 0.3:
This version was trained using Anima Preview 2. It is an experiment with a new training method for the model. You can consider it as another branch of AnimeYume 0.25, developed in parallel. However, this version uses new techniques and a larger dataset compared to v0.25.
Note: In this version, I experimented with a new training approach, so the model is slightly different from v0.25. Additionally, all example images were generated using prompts shared with users on CivitAI to evaluate whether this new method.
For version 0.4:
This version was trained on Anima Preview 3 using a custom dataset. In this release, I improved prompt understanding and artist style. Based on my testing, some artist styles match my expectations, although I haven’t tested everything in detail since I’m currently quite busy :<. Additionally, I fixed several issues from Anima Preview 3 that also appeared in Preview 2.
Note: I’ve only tested with simple test cases, not comprehensively, so if you encounter any issues, feel free to let me know. I also used a larger AI computing cluster to speed up the training process :D.
All example images were generated using prompts shared by users on CivitAI, as I wanted to evaluate the model’s performance.
For version 0.5:
This version was trained on Anima Base v1.0 using my custom dataset (a mix of a small e621 dataset and Danbooru). In this release, I added many new characters and improved the existing ones. I also enhanced support for various artist styles, allowing the model to generate results that are much closer to the original styles. In addition, the model now understands some concepts and knowledge from e621, although the support is still limited.
Notes: I’ve only tested the model with a few simple test cases so far, so if you encounter any issues, feel free to let me know. This release can be considered a demo version showcasing my new training method, which focuses on preserving existing knowledge while adding new knowledge at the same time. The release also came sooner because I was finally able to use all the resources I had available :D
All example images were generated using prompts shared by users on CivitAI, as I wanted to evaluate the model’s performance using real user prompts.
III. File Information
This file contains only the diffusion model and does not include a VAE or text encoder. To use it properly, you will need to download those components from the link here
IV. Notes & Feedback
This is an experimental fine-tuned release, and I am waiting for the final version release to tune it :D
Your feedback, suggestions, and creative prompt ideas are always welcome, every contribution helps make this model even better!
V. Acknowledgments
Big thanks to narugo1992 for the dataset contributions.
Credit to Circlestone Labs and Nvidia for the fantastic base model architecture.
If you'd like to support my work, you can do so through Ko-fi!
Description
FAQ
Comments (61)
Hi, version 0.1 is the initial experimental release. For this version, I trained the model on a subset of my dataset to enhance its ability to generate higher-resolution images and better understand natural language prompts. The training process was relatively short, as the goal was to evaluate the model’s capabilities. The results of preview of Anima model may feel somewhat unusual, weird and the quality is still developing, but the model shows strong potential.
Please feel free to share any feedback, I look forward to refining it further and moving toward a fully fine-tuned final version.
WONDERFUL!
please share the system you trained on and how long it took! and did you use pipeline?
bruh, how to use it? Forge? Do i need something like text encoders, special vae?
@compgamer1337267 Yes forgeui support this model. You can find the repo which is suport here: https://github.com/Haoming02/sd-webui-forge-classic
@wktra I used sd-script branch sd3 with some modification. I trianed it on around 5 days
can you inpaint tho on anima ?
@Seii1 Yes you can use inpaiting on Aima. The setup like another model in comfyui
@duongve13112002 i mean... do i need something special? I downloaded forge, anima model, it says "You do not have Qwen3 state dict!"
@compgamer1337267 Have you updated forge yet?
@duongve13112002 it's amazing. What kind of system did you train it on? And the dataset size?
Does the model still retain the ability to use natural language for prompting?
Could you share more technical info about this training run: dataset size, number of steps, loss graph, etc?
Also what are further plans regarding Anima: waiting for full model or trying to continue this training?
@GelukuMLG Yes the main of this finetune is improving the ability of model in natural language.
@wktra @Korewaai I am using 500k dataset and both all of them were labeled by tag and natural language. I trained on 30k steps
@duongve13112002 been playing with it for the past few hours and holy... it blows illustrious/noob away by a lot.
@GelukuMLG elaborate?
@Korewaai It can easily do stuff that sdxl/illust can't. For example, weapon over shoulder, it can easily do it but for sdxl you need a lora, same for coat over arms or heck even holding or casting a spell, it can do it without a lora.
does forge support it?
Forge Neo does~
PEAK
Could you tell me how to start the anima training script on your GitHub? I can't find a setup. bat similar to kohya_ss
Hi let's wait, i am working with kohy to adjust something and the full script for this model will release tomorrow, dont worry.
@duongve13112002 thanks!
@duongve13112002by th way, for anima, how many tokens of prompt can the training script accept?
One picture is annotated with dantag-style phrases and natural language respectively. Can it normally drop the dantag tags while not dropping natural language tags (to avoid damaging the semantics)?
and I have try the diffpip,set bs=11@1536*1536,adamw.
Under this setting, it only costs less than 17g VRAM, with a speed of about 2 seconds per step, which is 6 images per second. It's very different from SDXL and Lunima. Is this normal?
maximum 512 tokens
@ibara0608 If you want to separate tag and natural language, you will need to modify the training script. The prompt token limit is 512 if it exceeds this length, the prompt will be truncated.
I havent used diffusion pipe so i am not sure it is normal or not
@reakaakasky @duongve13112002 thanks,I get it!
this one might be the best anima model yet, cuz ive been trying any new ckpt point that came out
Finally a good Anima finetune, thank you! The only problem I see so far is wide aspect ratio images breaking sometimes, but I hope it will be fixed in later versions.
duongve at it again! Congrats on the awesome release! -ly
Thanks for sharing your model, it's great! 🤩
Anima definitely has a lot of potential. Hopefully, they'll enable LoRA training in the future! 🤞
LoRA training is already possible. The author of this finetune implemented a support for it in sd-scripts.
@munchkin send a link for it please
@KeMiliUs It was merged in kohya-ss/sd-scripts
Yeah, you're right, but what I meant was enabling training on Civitai hehe. If the creators of Anima reached a licensing agreement with Civitai, this model would undoubtedly become extremely popular. 📈
@KeMiliUs The machina fork of EasyScripts also has support if you want a GUI. 67372a/LoRA_Easy_Training_Scripts at refresh
Can confirm this model is significantly more stable.
With regular preview, 7 out of 8 images would get bad fingers whereas this is maybe 4, at least with one artist I tried and with the RDBT LoRA at 20 steps.
Now imagine how crazy it will be once we get a full model and then a finetune of that.
specs needed to run?
Runs fine on 6gb and 32ram. Seem to takes less ram that sdxl but it's half the speed.
@GelukuMLG throw sage attention and torch compile at it and it runs pretty fast
Works great even if it is only based on a test model, and it generates images pretty much as fast as Illustrious (with face-fix). Now I am looking forward to the release of the full anima version.
this is just a worse version of base with a gross wai filter
this is pretty bad, styles in general look worse.
Hey man Just wanted to say thank you for your great checkpoint for this great model. It's much more stable compared to the base preview model. Especially regarding anatomy and other things. Thank you for putting your time and effort into making this as I know the licence isn't the best but it's very much appreciated. Well done. Can't wait for the full version of Anima
I don't agree with people who say it breaks styles. It handles them pretty well while making the results more stable. The cause may be it works bad with quality-modifier LoRA, but aside it I see no significant drawbacks
Tried out most of the finetunes for Anima uploaded so far. Personally think this is the best one. it sacrifices the least of what Anima can do for some expansion on concept and style. 👍
Great checkpoint. From the ones I tried this one performs the closest to stock Anima but with improved quality.
Great model! Trains way better and way easier than the base anima model. Just feels like a straight up upgrade.
I really like this model, it works great with artists tags and LoRAs! I think this is the best Anima checkpoint at the moment
Really amazing finetune!
guys, abybody upcaling images?O_O
Ive experimented many times, but its a failure.
The videocard tells me not to do it again.
I also could not get upscaling to work the standard way. Use Ultimate SD Upscale
@degurshaft hi, yes, it works, but I got the best result using Illustrious in hiresfix. XD
thenks for answer)
@degurshaft but the quality of the stock images is amazing, so upscaling is not necessary)
@mifink94 I would not say so. 1mp images still turn out a bit raw, probably due to the training on 512
upscale with 5060 ti 16gb, 2x animesharpv3, downscale x0.75, 10 steps denoise 0.3
@NRVR downscale?O_O
Anima + Yume = PEAK
can i ask how to use it ? does 12gb vram can run it ?
get comfyUI, search Anima in workflow template, replace the Anima[preview] in diffusion model with this one. VAE and text encoder are the same as default workflow.
yes, 12gb VRAM can run it easily. The rest? Do what the other comment says.

















