Anima - JSON+English - Brent 1k (PREVIEW)

NSFW

What is this?

A tool for using JSON with Anima. This model does not require JSON, however it does provide added beneficial control WITH JSON while simultaneously being capable at many new plain English prompting capacities that were quite weak or non-existent before.

The trigger word is NOT the exact token "JSON", it's literal json in string form.

Prompt Directly

Use JSON > ENGLISH > BOORU.

You will get the best yield in this order. You can swap booru for english if you get hallucinations.

The model was trained with both english and booru json, so the processing should be okay.

90k Brent E1+E2 1.0

Temporary version, will be replaced with the full 1.0 train. All epochs available on huggingface.

https://huggingface.co/AbstractPhil/anima-90k

This is only the VLM half, that only ran for about 1 epoch. The plan is 2 epochs VLM and 1 epoch animetimm. That should be enough. The final version will be uploaded tonight.

Have fun.

Epoch 2 Release

The version is stronger and more capable while still containing the majority of the original model. More robust and capable than v1 and better at plain English.

Epoch 3 Time Stage

Epoch 3 is roughly 375,000 samples, which will be the full subject bucketing system imposed only on the animetimm system. This has shown the most robust capacity with this model, while still learning the plain English associations necessary to use more Qwen than before.

This will take roughly 74 hours, so by next weekend I'll have everything worked out for a full comfyui release.

10k Brent V0.5

{
"subjects": [
{
"name": "subjects name here",
"attributes": ["attributes", "go", "here however you want to divide them"],
"actions": ["actions go here", "in english or broken sequences"],
},
],
"setting": "supports settings",
}

Down here reinforce the system with plain english like this, explain the system and situation.

1girl, here, do, the, booru, tags, like how, you, would,

Probably doesn't need to be perfect, can likely jank it and it will not care if the json is valid.

Add up to 8 subjects, bounding boxes not supported yet, semantic offset is partially working, and associative offset is partially functional.

Attributes hallucinate without reinforcement with the booru tags, for now.

Will bias QWEN more heavily the higher the strength is for this version.

Strengths

Handles low step or high step models fairly well. Reduce strength for low steps and you'll still get some use of the json.

Weaknesses

Attributes hallucinate. Actions hallucinate. Names are pretty good.

1k Brent (Preview)

Similar format as the V0.5.

Booru tags MORE critical. Different biases

Weaknesses

Strong, but will bias a different array of images. More rigid and smaller array.

Text has problems, increase strength to the negative if you have large problems.

Brent 10k V0.5 Release

Fully revamped trainer; a forked diffusion-pipe with a considerably faster parquet processing pipeline.

https://github.com/AbstractEyes/diffusion-pipe/tree/feat/parquet-hf-dataset-backend

Instead of the anima trainer.

https://huggingface.co/datasets/AbstractPhil/diffusion-pretrain-set-ft1

10,000 images instead of 1000.

I ran too many epochs, however the balanced train will allow the model to operate on lower strength. The next run will be considerably more images, a higher diversity in images, a better character controller, a higher complexity yield for json capacity, and a much larger complexity with json prompts.

Subject Bucketing upgrade

The bucketing system handles roaring fast speeds and a shared grab-bag capacity for buckets which both reduces prep time and still produces more images than the model can ingest on 4 gpus. The parquet processing pipeline processes images considerably faster and still handles AR bucketing at lightning speed, all because of the random grab-bag processing capacity of the parquet system.

Improved Cache

The original caching system is quite improved now, converted to parquet processing that easily capped the 4 a40 gpus with 100% processing.

More Data

A much larger train of 10,000 dual-prompted images. Repeats are based on both buckets and their subject selectiveness frequency.

Suggested Use

I suggest reduced strength which will still promote the lora's strength without introducing the QWEN biases as strongly.

I've included trigger prompt assistance for using the built in subject format.

Brent 1k (PREVIEW) Release

https://github.com/AbstractEyes/anima-trainer

Trained with the same trainer as Anima was trained with originally - diffusion-pipe, snapped together with a new dataset organization system so I could run it in either Runpod or notebooks.

https://huggingface.co/datasets/AbstractPhil/diffusion-pretrain-set-ft1

This is 1k images randomly sampled and subject-bucketed from the 80k image dataset "qwen_90k" that will be trained next.

https://huggingface.co/AbstractPhil/Qwen3.5-0.8B-json-captioner

Each of the images were captioned using the VLM's VIT for a JSON outputted system and additionally a variant of AnimeTIMM VIT also captioned and then processed into JSON as well.

12 epochs on the VLM JSON captions, same images back in for 8 more epochs with AnimeTIMM JSON. This is the results from subject-bucketing with json.

More specifically

https://huggingface.co/blog/AbstractPhil/subject-bucketing

This is a subject-bucket trained JSON finetune.

The specific targets are meant to provide better accuracy and more fidelity to finetunes experimentally while simultaneously training a proof-of-concept paradigm related to subject-bucketing.

TLDR Subject Bucketing

Dataset, balancing. Normally you end up with a series of, problems from finetunes. Breakpoints, kinks, issues, distortions, faults, and so on.

This is meant as an experiment to solve those exact problems. By finetuning a model with JSON, you provide a form of differentiated perspective to the AI. By grouping subjects to a more complex paradigm as stated in the article - the differentiation becomes robust.

A little longer, still short.

Each token separator is another format of language that QWEN already understands and recognizes. The more you combine in sequence, the more QWEN will understand this process - providing more utilizable structure to the diffusion system.

With robust and orderly encodings provided to the diffusion system that include differentiated lesser-used tokens in conjunction with more common-use tokens, the more powerful the training results in useful outcomes.

Why?

The smaller-scale non-bucketed variants were successful, so it's time to train the real thing. The tool itself, and the tool yields.

Now the first 1k image train for the direct tool has been successful. The results are yielding and powerful. This merits a full uptick in training.

Description

FAQ

Comments (14)

goh_Jun 25, 2026

CivitAI

is this like bringing the json prompt capabilities from ideogram v4 to anima?

AbstractPhila

Author

Jun 25, 2026· 1 reaction

I haven't played with Ideogram V4 but I've been planning this one for a couple months. My dataset consists of over 700k fully prepared dual-prompt images with my shared QWEN 3.5 0.8b model as the catalyst for the entire system.

SDXL took to it like a bag of rocks, however Anima took it fairly clean.

VKilkoJun 25, 2026

CivitAI

What exactly does Lora do? Can I just use it to generate prompts in JSON format? What exactly does that look like?

AbstractPhila

Author

Jun 25, 2026

It accepts plain English prompting as well as JSON prompting.

VKilkoJun 25, 2026

@AbstractPhila But if this Lora not for enhancing the JSON promptstructure understanding, what is the idea for it? For what is this?

AbstractPhila

Author

Jun 25, 2026

@VKilko The model becomes more selective with larger margins between the LLM inputs. The LLM itself isn't particularly very smart, so more sparse captions have trouble. This both strengthens small chains of tokens by giving them scaffolding with JSON, as well as trains subject symbolism from the LLM into the diffusion mechanism. Thus allowing the model to align to specifics in a different way, in this case JSON was the catalyst and plain English was the mechanism.

AbstractPhila

Author

Jun 25, 2026

@VKilko https://huggingface.co/datasets/AbstractPhil/anima-90k-cache/tree/main/vlm This will give a good idea if what's in there.

Here is one with a viewer, same images.

https://huggingface.co/datasets/AbstractPhil/sdxl-qwen-phase0

N0n4m3Jun 26, 2026

CivitAI

@AbstractPhila What is structure / format of JSON?

I did some testing and ... I can't see any difference with | without this LORA using Anima base.

Modern models, surprisingly, do understand JSON, some more others less, i.e. using Anima gives 60/40 positive results but Krea2 jumps to 90/10.

I used Ideogram JSON description from KJ and am surprised that this does work so well for Krea2, not ideal, but this is all "Ai" shtick these days ("good enough so we all should use it"), much better than in Anima.

The most problematic part is bbox coordinates that Anima seams to ignore in i.e. 50/50.

AbstractPhila

Author

Jun 26, 2026

I haven't trained bounding box coordinates yet, you need to use difference offsets for now. "to the left of", "the upper right corner of the image", etc.

AbstractPhila

Author

Jun 26, 2026

The next structure I create will be substantially more powerful. I'm scaling up to full VIT classification capacity; text identification, rotation, offset, depth, scale, bounding boxes, and considerably more identified capacities all packed into JSON.

In that sense I'm going to find the strongest VLM that can run on the rtx 6000 pro's 95 gigs of vram, and with that the version 2 will be considerably more powerful.

Version 1 is currently cooking, and the subject semantics association preview shows that it will in fact yield - but my eyes are now open to something much much more powerful.

BrewceJun 26, 2026

CivitAI

As the sample images doesn't show any JSON in their prompt, could you give us an example ?

AbstractPhila

Author

Jun 26, 2026· 1 reaction

[ { "subject": "type", "attributes":["attributes", "go here for the thing",] "setting":"location and settings", }, { ... more subjects } ]

It's a bit barebones for now, but it'll get the model started for the next batch.

AbstractPhila

Author

Jun 26, 2026

There's an actual qwen model you can use to translate your plain english prompt directly to the json format that this model learned.

https://huggingface.co/AbstractPhil/anima-prelim-1k-r64/tree/main/comfy-qwen-json

The qwen node works in comfyui but I haven't packaged it up into it's own repo yet. It requires transformers >5.4

I suggest appending the plain english + booru tags after the json formatted data, which provides the necessary solidity to the prompt.

VKilkoJun 26, 2026

CivitAI

What do you think about xml as a input structure like NewbieAi have.
Example prompt:
<character_1>

<n>$character_1$</n>

<appearance>chibi, red_eyes, blue_hair, long_hair, hair_between_eyes, head_tilt, tareme, closed_mouth</appearance>

<clothing>school_uniform, serafuku, white_sailor_collar, white_shirt, short_sleeves, red_neckerchief, bow, blue_skirt, miniskirt, pleated_skirt, blue_hat, mini_hat, thighhighs, grey_thighhighs, black_shoes, mary_janes</clothing>

<expression>happy, smile</expression>

<action>standing, holding, holding_briefcase</action>

<position>center_left</position>

</character_1>

<character_2>

<n>$character_2$</n>

<appearance>chibi, red_eyes, pink_hair, long_hair, very_long_hair, multi-tied_hair, open_mouth</appearance>

<clothing>school_uniform, serafuku, white_sailor_collar, white_shirt, short_sleeves, red_neckerchief, bow, red_skirt, miniskirt, pleated_skirt, hair_bow, multiple_hair_bows, white_bow, ribbon_trim, ribbon-trimmed_bow, white_thighhighs, black_shoes, mary_janes, bow_legwear, bare_arms</clothing>

<expression>happy, smile</expression>

<action>standing, holding, holding_briefcase, waving</action>

<position>center_right</position>

</character_2>

<general_tags>

<count>2girls, multiple_girls</count>

<background>white_background, simple_background</background>

<atmosphere>cheerful</atmosphere>

<quality>high_resolution, detailed</quality>

<objects>briefcase</objects>

<other>alternate_costume</other>

</general_tags>

LORA

Anima

by AbstractPhila

Download (Beta) View on CivitAI

symbolic representation

Details

Downloads

Platform

CivitAI

Platform Status

Available

Created

6/25/2026

Updated

7/5/2026

Deleted

Files

qwen_anima_e20.safetensors

Size:

264.36 MB

SHA256:

a11ffe0f210d8c877e378ded3e21269a31cfdab841ca5d97f4dca48b742daac0

Mirrors

CivitAI (1 mirrors)

qwen_anima_e20.safetensors

Available On (1 platform)

Same model published on other platforms. May have additional downloads or version variants.

SeaArt

Anima - JSON - Brent 1k (PREVIEW)

What is this?

Prompt Directly

90k Brent E1+E2 1.0

Epoch 2 Release

Epoch 3 Time Stage

10k Brent V0.5

Strengths

Weaknesses

1k Brent (Preview)

Weaknesses

Brent 10k V0.5 Release

Subject Bucketing upgrade

Improved Cache

More Data

Suggested Use

Brent 1k (PREVIEW) Release

More specifically

TLDR Subject Bucketing

A little longer, still short.

Why?

Description

FAQ

What is Anima - JSON+English?

How do I use Anima - JSON+English?

Why might this LoRA not be producing the expected results?

What files are available and where can I download them?

Comments (14)

Details

Files

qwen_anima_e20.safetensors

Mirrors

Available On (1 platform)