I. Introduction
AnimaYume is a text-to-image model fine-tuned from Anima, a high-quality anime-style image generation model developed by CircleStone Labs. It builds upon Cosmos 2, a model developed by NVIDIA’s research team.
II. Information
For version 0.1:
This model is a preview version fine-tuned from the Anima base model using a custom dataset. Training was conducted across multiple resolutions ranging from 768 to 1280 pixels, with a primary focus around 1024. The goal of this release is to improve stability and minimize unwanted artifacts when producing high-resolution images.
Notes: All the example images at this version were generated at the resolution 1024x1536 or 1536x1024
For version 0.2:
This model is a continuation of AnimeYume v0.1. In this version, I improved the quality of my dataset and used several techniques to prevent oversaturation and low-quality outputs. Based on my testing phase, I observed that the prompt coherence is better than v0.1, and the model remains very stable when generating images at a resolution of 1536.
Note: I am still waiting for the final version of Anima and testing some methods to make my training process faster. I know the license might make the model less popular, but I only care about whether the model is good or not. I’m aware that many others use better licenses, but I’m too lazy to spend a bunch of money training a model from scratch.
For version 0.25:
This version was trained on Anima Preview 2. Due to several issues with the base model, such as overfitting, black/white borders, quality inconsistencies, and problems with artist tags, I decided to focus primarily on improving the model’s knowledge, reducing these issues, and making it as stable as possible.
Note: In this version, I did not attempt to improve the model’s style. I tried doing so, but it caused the model to forget some of its existing knowledge. The training process is similar to v0.2, but the dataset has been adjusted to better address the issues present in Anima Preview 2.
For version 0.3:
This version was trained using Anima Preview 2. It is an experiment with a new training method for the model. You can consider it as another branch of AnimeYume 0.25, developed in parallel. However, this version uses new techniques and a larger dataset compared to v0.25.
Note: In this version, I experimented with a new training approach, so the model is slightly different from v0.25. Additionally, all example images were generated using prompts shared with users on CivitAI to evaluate whether this new method.
For version 0.4:
This version was trained on Anima Preview 3 using a custom dataset. In this release, I improved prompt understanding and artist style. Based on my testing, some artist styles match my expectations, although I haven’t tested everything in detail since I’m currently quite busy :<. Additionally, I fixed several issues from Anima Preview 3 that also appeared in Preview 2.
Note: I’ve only tested with simple test cases, not comprehensively, so if you encounter any issues, feel free to let me know. I also used a larger AI computing cluster to speed up the training process :D.
All example images were generated using prompts shared by users on CivitAI, as I wanted to evaluate the model’s performance.
For version 0.5:
This version was trained on Anima Base v1.0 using my custom dataset (a mix of a small e621 dataset and Danbooru). In this release, I added many new characters and improved the existing ones. I also enhanced support for various artist styles, allowing the model to generate results that are much closer to the original styles. In addition, the model now understands some concepts and knowledge from e621, although the support is still limited.
Notes: I’ve only tested the model with a few simple test cases so far, so if you encounter any issues, feel free to let me know. This release can be considered a demo version showcasing my new training method, which focuses on preserving existing knowledge while adding new knowledge at the same time. The release also came sooner because I was finally able to use all the resources I had available :D
All example images were generated using prompts shared by users on CivitAI, as I wanted to evaluate the model’s performance using real user prompts.
III. File Information
This file contains only the diffusion model and does not include a VAE or text encoder. To use it properly, you will need to download those components from the link here
IV. Notes & Feedback
This is an experimental fine-tuned release, and I am waiting for the final version release to tune it :D
Your feedback, suggestions, and creative prompt ideas are always welcome, every contribution helps make this model even better!
V. Acknowledgments
Big thanks to narugo1992 for the dataset contributions.
Credit to Circlestone Labs and Nvidia for the fantastic base model architecture.
If you'd like to support my work, you can do so through Ko-fi!
Description
FAQ
Comments (67)
The timing couldn't be better! I just managed to download everything I need to use anima, so I will be using this instead of the base anima model, thank you!
Honestly some of your models are the best I've ever seen. I remember first trying your fine-tune Checkpoint: NetaYume Lumina.
I love anima and I think it's going to be the new Illustrious. I was wondering if you would ever collaborate with other great creators like WAI0731, Crody, reakaakasky to make the ultimate checkpoint with all the datasets from all the top base model creators I know that's a massive undertaking but it would be soo cool.
Hi, I’m just an individual who fine-tunes models. I do it because I enjoy it and want to create a better version that everyone can use. I’m very busy and don’t have much time, so collaborating with others isn’t a good idea right now. However, if I have the chance in the future, I will collaborate with them.
is this better at higher reso ? also nice finetuned, i dont really give a dam about license anyway
Hi. Yes because v0.2 is based on v0.1. Plus all the example images were generated on 1024x1536 or 1536x1024
nice but i noticed that its kinda decrease on prompt adherence but not by much, great job on this one
@Seii1 I am not sure but from my testing phase i see v0.2 has better prompt conherence than 0.1/ WOuld you mind sharing the prompt which you used ?
@duongve13112002 maybe the way i prompt its different or maybe i just to try more on this version, thx for answering
i usually do like this
masterpiece, best quality, good quality,beautiful anime-style.
2girls,Character details :
1st Character :
2nd Character :
Pose and Scene:
........
Background details :
...
@Seii1 Oh thanks, i will test this later
@Seii1 hey so this is from the official anima higginface page:
Natural language prompting tips
If using pure natural langauge, more descriptive is better. Aim for at least 2 sentences. Extremely short prompts can give unexpected results (this will be better in the final version).
You can mix tags and natural language in arbitrary order.
You can put quality / artist tags at the beginning of a natural language prompt.
"masterpiece, best quality, @big chungus. An anime girl with medium-length blonde hair is..."
Name a character, then describe their basic appearance.
"Digital artwork of Fern from Sousou no Frieren, with long purple hair and purple eyes, wearing a black coat over a white dress with puffy sleeves..."
This is extra important when prompting for multiple characters. If you just list off character names with no description of appearance, the model can get confused.
I personally use Danbooru tags for certain poses, anatomy and complex stuff As it's usually more stable than plain natural language. Also if it's still struggling try waited tags like (skinny:1.4)...
If all else fails, try this template that I used for z image turbo but I adapted it to anima
[Style & Aesthetic]
(Your quality tags)
[Composition & Camera]
(What camera angle you want the photo, full body shot, Dutch angle...)
[Subjects & Anatomy]
(How many subjects do you want? Make sure that you use names like Jess a 21-year-old blonde woman, John a 48-Year-Old african man.... Instead of just 1girl and 1boy As I found this works better sometimes, but your mileage may vary.)
[Action]
(What your subject/subjects are doing...)
[Environment & Atmosphere]
(The more description of your environment and background the better as if you don't. It tends to give you the same background because the model is very good at following the prompt. The variety isn't the best sometimes at the moment)
[Lighting & Contrast]
(Be careful with this because if you go too much on the lighting it can make the characters look a bit washed out and things like that I tend to just say if it's a sunny day or something simple like that)
I've also experimented with BREAK like in SDXL and Illustrious And it's surprisingly worked pretty well, especially when my prompt has been very long.
These may or may not work for you but these are some of the things I've been trying. I hope it helps.
@sallypooni12671 yes that why i using character 1 and 2: i describe their detailed apperance there, btw thx
Hi Boss, "license might make the model less popular". Could you explain it? I dont understand anything, but it's very interesting((
The license is a modified version of the flux 1 license which prevents hosting the model commercially unless you pay for a license. It's not a problem for average users but it could turn off larger finetuners who want to train the model for their own services.
@HDiffusion Oh, thank you a lot, Its clear)
99.99% of users will not be affected by the Anima license.
However, for big trainers/teams with the resources to fine-tune the entire danbooru dataset (like noobai, rouwei etc.), they might choose to train a new base model instead of on top of Anima. Because they may be sponsored and the Anima license is not friendly to them.
On the positive side, a better model than Anima might emerge.
On the negative side, both models might remain undertrained for a long time.
0.2 is worse than 0.1 in terms of consistency. And It it's more censored now.
But it is giving more detailed and refined output in general
I mean I haven't tried it but how is it more censored? As I thought Anima is pretty uncensored 😯
@sallypooni12671 Yes, Unfortunatelly 0.2 makes output deformed and misfigured (like SD 3 done with "girl laying on grass" old days) on the same NSFVV prompts, where yume 0.1 makes them shine comparing to anima-preview
@zigmund have you tried the RDBT - Anima LORA as it helps with stability, quality and prompt adherence. It's definitely worth a try. Not sure if the creator of this checkpoint has added it in, but if not it's still worth using. Just be careful not to use it in any model mergers as the author specifically says not to do that.
@zigmund Hi, I checked and didn’t find anything related to your problem. My dataset contains both SFW and NSFW content. Would you mind sharing the prompt you used?
@AnimaXx Yup, RDBT – Anima LoRA is a good LoRA for stability, and it adapts very easily to my model.
@duongve13112002
Dude, first of all - I really appreciate your work. Yuma 0.1 is the best for me. I don't know what you've done, but Anima actually bloom in comparison to basic anima-preview.
And I think I understood the problem: It's the prompt written in plain text, not with tags in general. I really like the possibilities qwen provides. The emphisizing with (1girl:1.666) doesn't work anymore. If you want to make a strong composition either use tags correctly or describe in the good way. And my assumption is when you train your finetune, you ignore the plain-text part. You focus on tags, and 0.2 model goes stronger in that term. It also has better body features, better faces. And in the same time 0.2 struggles with an artistic component, plain-text description, text rendering and sensual frases recognition.
I noticed that in Animaika model also. When I got that idea, I tested it deliberately, and it seems to me that straying away from descriptive language towards tags-only - leads to degradation. From 1.0 version of Animaika is graduately worsen throughout versions until 2.2 if you don't use tags beads.
So my proposal to you, (if you would listen just mere g00ner in the internet) is to reconsider your approach to learning. Investigate how it was done initially by anima-preview creator. He not only made tags involved, but also descriptive text and there was a DeviantArt database which I don't know how was tokenized. And please relearn yumi from beginning with that in mind. If it's not possible, than I will be Ok with it. Personally I'll stick to the initial versions of every finetune showing up (because they all much better than base version), until someone will make training done comprehensive.
Here is my prompt.
You can test it in any sampler/scheduler combination. But most visible difference between 0.1 and 0.2 is er_sde + simple + 45 steps + 4.0 cfg
<i>
sensitive, @valentina tavolilla,
An Elegant fine art portrait watercolor from the French Belle Epoque era.
Two characters: 40 years old woman and 30 years old young man.
Man is sitting on the bed underneath gorgeous woman. Man is tightly hugging woman rubbing over her body and her velvet clothing. The woman is backwards sitting on man's laps and she looks embarrassed and irritated by the situation. She unwantedly pulling up the skirt of her red dress in this intimate moment. Woman is hurshly wispering: "Is this how you treat you maids?"
</i>
@zigmund Thank you for your advice. My dataset contains many types of captioning, for example tags, natural language descriptions, and even chain-of-thought style captions. So I don't think that is the real problem. Moreover, my strategry training is similar to the author of anima mentioned
I suspect the issue might be related to the way I control high contrast in the images too strictly. I'm not completely sure, but I personally dislike very high contrast because it can sometimes cause problems.
I also encounter random censor, even if i added negative prompt "censored", "mosaic", "mosaic censor", "censor bar", does not work.
But so far as long as i put "uncensored" at the positive prompt, even without the above negative prompt, it worked
@deadlydoom0708 Ok thank you, i will find out the root cause of this
@duongve13112002 the root is big weight of tag, it is long know that lifting weight more than 1.4 causes distortions and instability
You sir, are a genius.
Awesome model, really cleans up images a lot in my short testing with it.
one small note about the description. it says "I observed that the prompt coherence is better than v0.2", i believe it should say "I observed that the prompt coherence is better than v0.1" here?
Oh my mistake thanks
Great model, maybe very slight regression and prompt adherence. But overall, the quality looks way better and has way better taste in terms of composition etc.
Unfortunately, 0.2 makes more mistakes with hands (5 fingers, for example, happen more often) in portrait resolution. It also seems to generate slightly less "mature" characters (can be noticeable on faces) compared to 0.1.
Hi would you mind telling your prompt did you use it. this will help me a lot to know the real problem
@duongve13112002 masterpiece, best quality, highres, newest, 1girl, kusanagi motoko, ghost in the shell, @aizheajsee, muscular female, aged up, solo, upper body portrait, black hair, white eyes, nude, medium breasts, linea alba, hip bones, pussy, clitoral hood, long labia, x anus, thumb up, wedding ring, bedroom, spread legs, aroused, morning, sunlight, sidelighting, window.
negative was:
low quality, worst quality, normal quality, score_1, score_2, score_3, score_4, score_5, score_6, score_7, jpeg artifacts, text, watermark, signature, banner, bad anatomy, bad hands, deformed hands, extra fingers, mutated hands, missing fingers, malformed limbs, fused fingers, too many fingers, loli, simple background, chromatic aberration, makeup, poorly drawn, disfigured, deformed, mutation, censored, bar censor, mosaic censoring, young.
I've tested the same parameters on v0.1, and it was fine. Wide aspect images are good in v0.2, on the other hand, and seems like author styles are better (closer to the original) than in v0.1.
@dobomex761604 I find using natural language produces better results than just strictly booru.
@chudzilla8920 Yeah, but it's a bit weird to get 5 fingers because of that.
Have you tried lowering or upping The CFG all steps as I've noticed even that the base preview by default sometimes give me weird artifacts even if my negative prompts say not to have any extra hands or any legs. It still manages to do it. It's better at natural language but the base preview too I think has a bit more artefacts and preview one. That might be why. I haven't tried this fine tune yet but hopefully it can address it.
@AnimaXx Yes, I've tested multiple steps and schedulers, as always (Anima is still a bit quirky). In any case, a new version is out, looking forward to testing it.
Hi everyone, if you experience any issues with the model, feel free to reply to this comment. Please describe the problem as clearly and in as much detail as possible. If possible, also share the prompt you used. I will use your prompts and descriptions to help identify the real issues with the model.
To clarify, I am not prohibiting N-S-F-W content or anything like that, my dataset already contains such content. Also, my dataset does not rely only on tag-based captions. I use a mix of tags, natural language descriptions (in multiple formats), and some chain-of-thought style captions.
If you feel the model behaves poorly or gives bad results, just let me know. I’m still learning how to control and improve this model, especially since the LLM adapter can sometimes make the model unstable. 😄
Can you please give an example of "chain-of-thought style captions"? I only remember CoT from the old LLM days, how was it applied here?
so-far I've had no issues with the model personally, I find it does well by mixing booru and long natural language.
@dobomex761604 Here is an example for that
"1. Setting
The scene takes place in a quiet classroom during the late afternoon or early evening.
A warm amber sunset shines through a large window on the left side of the room.
The sunlight creates long rectangular beams of golden light across the polished floor.
2. Classroom Environment
The window spans almost the entire left wall, letting in a large amount of warm light.
On the far wall hangs a dark, worn blackboard.
Beneath the blackboard are several scattered boxes, suggesting storage or recent unpacking.
3. The Character
A young schoolgirl stands near the blackboard.
She wears a traditional Japanese school uniform:
Black jacket
White shirt underneath
Knee-length plaid skirt (navy, red, and white)
Thigh-high black socks with a white stripe near the top
Standard school shoes
Her long dark hair falls loosely to almost waist length.
She looks back toward the viewer with a calm and gentle smile.
4. Object in Her Hands
She holds a rolled-up paper, likely a diploma or certificate, clasped in front of her.
5. Objects in the Room
Near the window is a wooden desk piled with books and papers.
The stacks suggest intense studying or exam preparation.
A wooden chair sits partially tucked under the desk, also illuminated by sunlight.
6. Mood and Theme
The overall atmosphere is peaceful, nostalgic, and reflective.
The warm sunset lighting and quiet classroom imply after-school stillness.
The diploma hints at a transition or ending, possibly graduation or farewell."
@duongve13112002 I've never captioned this way I'll give it a try.
I've been using AnimaYume v0.1 for my AnimaPreview1 trained Loras and it has been performing exceptionally well. However, with v0.2 I noticed there is a certain degradation of detail in some scenarios.
For example AnimaYume V0.1: https://civitai.com/images/123476344 this image looks great, all details are good (especially around eyes/face).
Same exact prompt/generation settings/workflow but V0.2 model: https://i.imgur.com/dslOfji.png
Notice the distorted eyes.
This is something I've noticed across most images with V0.2 -> eyes don't look great in wider shots/more complex angles (upper body shots/portraits look ok still).
Full prompt is in the first linked image.
Can you help me how to describe a prompt for my lora characters,that their original clothes doesnt mixed? https://civitai.com/models/2446635/kuroinu-girls-or-anima
Oh Anima preview 2 was released should i train next version :D. Seem it has better stuff. Give me some your idea thanks
Please train more! I like AnimaYume
Yes please! I've actually been testing Preview 2 out and it has a bit better prompt adherence and knowledge. Personally I really like it. It also follows natural language a bit better as well. If you have time to fine tune will be much appreciated like you're great AnimaYume Checkpoints. Thank you
Would be nice. The Yume dataset seems to be quite nice.
p2 give me a strange vibe even after finetuned.
p2: https://civitai.com/posts/27209083
p1: https://civitai.com/images/122202628
Hard to describe, kind of similar to this guy's comment https://civitai.com/models/2458426/anima-official?modelVersionId=2764263&dialog=commentThread&commentId=1136194
I wonder what's the "regulation dataset". Feels there is always a filter on top of everything.
@reakaakasky I haven't tried that art style but I have to say that I don't know what they've done but it can handle more complex prompts as I think they said a natural language understanding is a bit better, however it is a small update so I'm hoping future versions will iron out problems and be better trained for the final release. This is with your LORA but without it like the first preview base version It's pretty unstable and inconsistent, especially regarding anatomy. That's why your LORA brings night and day difference with quality, stability...
@reakaakasky Oh, interesting point. From my viewpoint, the quality seems a bit weird and the knowledge doesnt better, but it is good in NL. Moreover, I guess the author might have issues with the dataset
Please train more! Preview 2 has better prompt adherence but I'm missing your finetune!
@reakaakasky Why are you comparing p2 raw with your low cfg "based on p1" model? Besides just relying on natlang is not the best strategy.
@deitychaser p2 images are not from raw p2, I've trained rdbt v0.18, the distillation isn't finished, so there are high-freq artifacts, just ignore those artifacts.
p2 can't generalize. p1 does not have such issue.
I will stick to p1.
@reakaakasky could it be that the error is that he didn't train the text encoder this time?
@deitychaser idk, for me it looks like the p2 "overfitted" somehow.
@reakaakasky that guy comment on the official cringe me out lol typical AI user, i guess its something to do with their dataset, maybe their trying something different here and there, thats why still called preview,
@duongve13112002 this is from the official Higginface page "This is a base model with no aesthetic tuning. It is designed to be wild and creative, with the maximum possible breadth of knowledge. It is not optimized to produce aesthetic or consistent images." So I think that's why it's pretty wild and inconsistent sometimes.
It also said this regarding preview 2 "A significant part of the training is redone with different hyperparameters and techniques, designed to help make the model more robust to finetuning."
Agreed with the others, would love to see a version built on v2!
From my experience training a lora on anima yuma performed much better than training on stock anima. Can't wait to train the updated yuma!
@MisticRain69 @reakaakasky @AnimaXx Currently, i am tunning the model. According to my experiment, the model seems unstable according training process not like author saying (i dont know why or i am stupid set up). Plus, the base model seems forget something, this process may e longer because i have a lot of things to do. I hope it will work well :D
@duongve13112002 For preview 1 he advised to disable training the adapter to prevent that from happening so strongly. Idk if that's still true. The guy who makes the ai style dump loras seem to have way improved results with preview 2 training (less forgetting and style mixing works better). Idk if he trained the adapter or not.
@duongve13112002 This might help as it's from the official Anima Higginface page:
Finetuning Tips
Any LoRA you train on a preview version should be considered a "throwaway" LoRA. There's no guarantee it will work well on the final version.
Don't train the LLM adapter. My own training script, diffusion-pipe, lets you set llm_adapter_lr=0 to completely disable training it, and the example config has this as a default.
Other trainers like sd-scripts have similar options that should be used.
The LLM adapter processes the text embeddings before they get to the diffusion model, and therefore has an outsized influence on the generated images. The adapter itself contains a surprising amount of knowledge and is easy to degrade by training it.
Use a low learning rate. For a rank 32 LoRA, start with 2e-5 and adjust up or down from there.
As a base model, there is no aggressive aesthetic tuning or RLHF you need to overcome when finetuning.
The model has an extremely large and diverse amount of visual concepts baked in already. A light touch is all you need.
I skipped blocks 0-5.
In Lumina-2 and Z-image, those initial blocks are saturated, so the gradients are meaningless. Cosmos doesn't seem to suffer from this, but whatever.
@reakaakasky I finished tunned a small improvement for preview 2. I agree that this version the model seems over unstable. My tunned version keep almost the knowledge base but enhance undestanding langugae (i am still testing :v)
hmmm. I think p2 is over stable.....
@duongve13112002 I really love your previous AnimaYume model, it's absolutely amazing! Can't wait to see your magic on Preview 2. Please take your time though, I'm really looking forward to your new work! :D



















