Arknights: Endfield | Operators Collection

Arknights: Endfield | Operators Collection - v1.0 [anima-preview]

NSFW

Arknights: Endfield | Operators Collection

Intended to generate non-commercial fan works of Operators from the Arknights: Endfield video game.

ℹ️ LoRA work best when applied to the base models on which they are trained. Please read the About This Version on the appropriate base models and workflow/training information.

Trained on a large mixed NL and tags dataset with 30% tag dropout rate, at mixed [1024, 1280] resolutions. Previews are mostly generated at 1024x1536.

Operators (game version 1.0):


female endministrator \(arknights\)
male endministrator \(arknights\)
perlica \(arknights\)
chen qianyu \(arknights\)

akekuri \(arknights\)
alesh \(arknights\)
antal \(arknights\)
arclight \(arknights\)
ardelia \(arknights\)
avywenna \(arknights\)
catcher \(arknights\)
da pan \(arknights\)
estella \(arknights\)
fluorite \(arknights\)
gilberta \(arknights\)
laevatain \(arknights\)
last rite \(arknights\)
lifeng \(arknights\)
snowshine \(arknights\)
wulfgard \(arknights\)
xaihi \(arknights\)
yvonne \(arknights\)

Operators (game version 1.1):


mi fu \(arknights\)
rossi \(arknights\)
tangtang \(arknights\)
zhuang fangyi \(arknights\)

Antagonists:


ardashir \(arknights\)
nefarith \(arknights\)

Works best in combination with NL if you name a character, then describe their basic appearance.


A vibrant and dynamic illustration of Yvonne from Arknights: Endfield, featuring her with long pink hair styled in twintails, pointy ears, and small horns, along with a playful tail...

To be fixed:

Pogranichnik had a mistake for his labelling

Ember did not seem to learn her outfit/features

Description

null

FAQ

Comments (12)

TIK7778899Mar 6, 2026· 1 reaction

CivitAI

MAKE ANIMA 4 EVERRRRRRRRRR

ILLUSTRIOUS 🪦R.I.P.

degurshaftMar 9, 2026· 2 reactions

CivitAI

Hi! This is an amazing lora. It would be great if you could help me out with a few points, as out of all the loras posted so far, yours seem the most interesting to me.

I tried training a character loras using a standalone gui with my Illustrious experience and following config: AdamW8bit, cosine, LR 1e-4, 1 repeat, 4 batch, 40 epochs, 10 warmup, 16dim/16alpha, frozen TE, and 20 images. The captions were limited to tags only, with a abstact activation token that absorbed all physical descriptions (like 1girl, red hair, etc.). The training took about 300 steps(how many steps r actually relevant for 20imgs🤔). While the character was trained, I wouldn't call the model high quality, the look is quite raw, and strangely, the female character's features project onto men unless I use NL (maybe an issue with Anima itself, I'm not sure). This raised some questions while I was looking at your attached config.

How exactly do you handle image captioning? I understand that for diffusion pipe, you need to provide two versions of the caption (NL and tags). I noticed in your config that you don't shuffle the first 5 tags or the first NL sentence. Does this mean you start with the activation tag + character appearance, or is this just to preserve the activation token and the first NL sentence containing the character name and source (e.g, "A vibrant and dynamic illustration of XXX from XXX")? Just want to know what the structure of your tag and NL captions looks like and if I should remove physical descriptions as was relevant for SDXL. Also, I'm curious about the caption selection mechanism, if mixed_weights isn't set, does it simply pick between tags and NL randomly for each iteration?

Do you train the TE, and how necessary is it actually?

How relevant is using tag dropout?

So far, I'm not satisfied at all with the results I'm getting from my test runs. I'd be very grateful if you could answer these questions or point me in the right direction

motimalu

Author

Mar 10, 2026· 1 reaction

Hello, I'll try to answer some of your questions here

>How exactly do you handle image captioning?

Booru tags are used to create the image.txt caption files, with the first tags being character name, copywrite, @artistname.

image_nl.txt files are captioned with a VLM using these tags as grounding.

You can reference the VLM captioning script I used here, it should give a good idea of the overall captioning strategy I used:

https://github.com/motimalu/diffusion-workflows/blob/main/nl-captioning/label-large.py

>Does this mean you start with the activation tag + character appearance, or is this just to preserve the activation token and the first NL sentence containing the character name and source (e.g, "A vibrant and dynamic illustration of XXX from XXX")?

Basically this yes, since with mixed_weights I'm training on only NL at 10%, I want to ensure that the first NL sentence contains the activation tags.

I have turned off nl_shuffle_sentences so nl_keep_first_sentence doesn't really have an effect though.

>should I remove physical descriptions as was relevant for SDXL.

A dropout rate for non-activation tags would probably be better to preserve the models flexibility, but removing physical descriptions works too.

>Also, I'm curious about the caption selection mechanism, if mixed_weights isn't set, does it simply pick between tags and NL randomly for each iteration?

The default is used when mixed_weights is unspecified:

DEFAULT_MIXED_WEIGHTS = {'tags': 50, 'nl': 10, 'tags_nl': 20, 'nl_tags': 20}

Lastly, a lot of the models knowledge is stored in the Qwen TE, so while you risk reducing flexibility you will probably have better results enabling train_llm_adapter to learn something novel to the model, like characters or concepts that did not exist at the date it was trained.

degurshaftMar 10, 2026

@motimalu Thanks for the answers!
I usually used a custom comfy workflow where I used WD14/Florence for tagging, but your script looks very interesting! I also think it might be interesting to test an abliterated version of Qwen so it's a bit more "unleashed" when it comes to describing explicit/sensitive rating images.

I'm still puzzled by the technical side of training. In your recent Anima models, I've seen a value of 60 epochs and an LR of 5e-5 in many places. At the same time, you clearly chose earlier epochs where the loss was lower, and even with ~500 or more steps at a batch size of 6, your models look great without a hint of overtraining. Meanwhile, I observed significant overtraining at the same lr by step 300~400, even with a smaller batch (4). At first, I thought the problem might be the scheduler, but according to the graphs, cosine was doing a good job at reducing the lr... I understand you likely had larger datasets compared to my modest 20 images, but these results still leave me baffled

motimalu

Author

Mar 10, 2026

>I also think it might be interesting to test an abliterated version of Qwen

I'm not much of a fan of abliteration, the generalization capabilities are what I would value most of a VLM for captioning images.

> I understand you likely had larger datasets compared to my modest 20 images, but these results still leave me baffled
Yes the larger the dataset the longer I find I can train without overfitting, the tag dropout and mixed nl probably contributing to that as well.
I guess this lora was the largest dataset so far I have trained for Anima, it was fine enough around 4600 steps but I let it keep going until 18,400 hoping to see if the 1280 resolution training would converge on the very small details like the characters hairpins etc.
It didn't quite converge on those small details, but did show continual improvement.

degurshaftMar 10, 2026

@motimalu In that case, what about small datasets? For example, like the ones with Sakura Miko, which clearly weren't trained on a very large volume, considering her loungewear outfit only has 40 images on Danbooru :D I'm very interested in your experience with this, as I mostly deal with training on such small datasets that don't exceed 30 images even in the best-case scenario

motimalu

Author

Mar 11, 2026

@degurshaft I guess the smaller outfit lora I trained would have similar flexibility issues to what you describe, they will apply the some of the trained outfits characteristics to any character in the scene. I mostly just trained those to test my dropout/nl mix config would produce outfits accurately when scaled to larger datasets like this one.

Training more concepts or characters or w/e together I've found produces more robust models with better generalization. So intuitively it might come down to the bitter lesson; general methods leveraging more data and more compute will produce better outcomes.

You can theoretically include multiple resolution training and throw more compute at captioning and ensure you are maximizing the gains possible by describing images in maximum detail for small datasets, but probably it will still over-fit to some extent if in the end you're only training on a single subject.

Reducing the lora rank, using a lower LR, and not training the TE are all things that are typically done to reduce the effect of the training on the models overall weights and prevent over-fitting, so I'd try those for small datasets if you're still unhappy with the results - but I would personally go the other way and scale to larger multi-concept datasets if hoping for better generalization.

degurshaftMar 11, 2026

@motimalu Yeah, either way, more data = better results :(

I basically went the same route of lowering the lr, rank, etc., and in principle, the models turn out... acceptable. It’s probably foolish of me to expect a perfect result with datasets of this size, especially on a preview model :)

Regarding the scaling method, I think that will definitely be interesting when there's an opportunity to group datasets by theme or category. Looking back, this was useful on both pony and illustrious, but seeing your Arknights character LoRA, it seems there's even more sense in it now :D

I have a couple of questions left, and then I won't bother you anymore. Looking at your script, I wanted to know how effective the method of using @artist is when captioning images? Am I correct in assuming that specifying the artist who created a specific image improves flexibility, and in that case, is it worth tagging an artist whose data isn't in the model?

This leads to a more specific question, I’m planning to try training a lora for the channel_(caststation) style, which is clearly present in the Anima dataset, but the style itself is far from the original. This creates a dissonance. Should I avoid including the tag @channel_(caststation) and use a unique activation token instead, treating it as a completely new style unrelated to what’s already in the model? Or should I use @channel_(caststation) as the activation token so that during training it recognizes all images as being from that specific artist, and subsequently merges with the existing weights of the model during generation? Is there really a need for Anima to use strange activation tokens, like we did for ilxl, so that the tag doesn't exist in clip?

motimalu

Author

Mar 11, 2026

Hey @degurshaft no problem,

> I wanted to know how effective the method of using @artist is when captioning images? Am I correct in assuming that specifying the artist who created a specific image improves flexibility, and in that case, is it worth tagging an artist whose data isn't in the model?

Yes I think its definitely worth doing this to improve flexibility, particularly if you're training on images from multiple artists or an artist that is unknown to the model. The Anima base model also specifies artists with the @ prefix and will pick up the tags style association very easily in training.

> Is there really a need for Anima to use strange activation tokens

Not really, if you're training the TE it is a LLM so it can probably make sense of a lot more complexity in how you'd like to approach tagging - you'd be forcing it to re-learn what its already generalized about how artists are tagged though.

To have the model learn the difference between an old and new style in the same dataset then tags like "year 20XX" or "oldest"/"newest" might help, I haven't tried this though.

Otherwise just training on the newest data would probably work well enough to shift the model towards it.

degurshaftMar 11, 2026

@motimalu don't really care about separating styles by date. So, does it mean that the best solution for training an artist's style, specifically channel, whom I mentioned above, would be to use an activation token with the tag @channel_(caststation) that already exists in the model, so that it shifts the model toward the data we are using to train the LoRA?

And by the way, do we need to escape the brackets using \ in captions ?

motimalu

Author

Mar 11, 2026

@degurshaft Yes you should probably use the tag with syntax "@channel \(caststation\)" to shift the models understanding of the style, since that's what it would have learned it with.

This part I was a bit confused about for Anima since parenthesis weighting on tags doesn't work with its TE, so there shouldn't have been a need to escape the parenthesis in these tags. But it seems the training data had that approach so that's what it learned, for continuity I guess. ┐(´ー｀)┌

degurshaftMar 11, 2026· 1 reaction

@motimalu Thank you for such detailed answers, it was a pleasure talking to you :3

LORA

Anima

by motimalu

Download (Beta) View on CivitAI