CivArchive
    Preview 134862940
    Preview 134862946
    Preview 134862956
    Preview 134864258
    Preview 134864261
    Preview 134864262
    Preview 134864264
    Preview 134864774
    Preview 134864786
    Preview 134864793
    Preview 134864799
    Preview 134865019
    Preview 134865049
    Preview 134865066
    Preview 134866121
    Preview 134866124

    What is this?

    A tool for using JSON with Anima. This model does not require JSON, however it does provide added beneficial control WITH JSON while simultaneously being capable at many new plain English prompting capacities that were quite weak or non-existent before.

    The trigger word is NOT the exact token "JSON", it's literal json in string form.

    Prompt Directly

    Use JSON > ENGLISH > BOORU.

    You will get the best yield in this order. You can swap booru for english if you get hallucinations.

    The model was trained with both english and booru json, so the processing should be okay.

    90k Brent E1+E2 1.0

    Temporary version, will be replaced with the full 1.0 train. All epochs available on huggingface.

    https://huggingface.co/AbstractPhil/anima-90k

    This is only the VLM half, that only ran for about 1 epoch. The plan is 2 epochs VLM and 1 epoch animetimm. That should be enough. The final version will be uploaded tonight.

    Have fun.

    Epoch 2 Release

    The version is stronger and more capable while still containing the majority of the original model. More robust and capable than v1 and better at plain English.

    Epoch 3 Time Stage

    Epoch 3 is roughly 375,000 samples, which will be the full subject bucketing system imposed only on the animetimm system. This has shown the most robust capacity with this model, while still learning the plain English associations necessary to use more Qwen than before.

    This will take roughly 74 hours, so by next weekend I'll have everything worked out for a full comfyui release.

    10k Brent V0.5

    {
    "subjects": [
    {
    "name": "subjects name here",
    "attributes": ["attributes", "go", "here however you want to divide them"],
    "actions": ["actions go here", "in english or broken sequences"],
    },
    ],
    "setting": "supports settings",
    }
    
    Down here reinforce the system with plain english like this, explain the system and situation.
    
    1girl, here, do, the, booru, tags, like how, you, would,

    Probably doesn't need to be perfect, can likely jank it and it will not care if the json is valid.

    Add up to 8 subjects, bounding boxes not supported yet, semantic offset is partially working, and associative offset is partially functional.

    Attributes hallucinate without reinforcement with the booru tags, for now.

    Will bias QWEN more heavily the higher the strength is for this version.

    Strengths

    Handles low step or high step models fairly well. Reduce strength for low steps and you'll still get some use of the json.

    Weaknesses

    Attributes hallucinate. Actions hallucinate. Names are pretty good.

    1k Brent (Preview)

    Similar format as the V0.5.

    Booru tags MORE critical. Different biases

    Weaknesses

    Strong, but will bias a different array of images. More rigid and smaller array.

    Text has problems, increase strength to the negative if you have large problems.

    Brent 10k V0.5 Release

    Fully revamped trainer; a forked diffusion-pipe with a considerably faster parquet processing pipeline.

    https://github.com/AbstractEyes/diffusion-pipe/tree/feat/parquet-hf-dataset-backend

    Instead of the anima trainer.

    https://huggingface.co/datasets/AbstractPhil/diffusion-pretrain-set-ft1

    10,000 images instead of 1000.

    I ran too many epochs, however the balanced train will allow the model to operate on lower strength. The next run will be considerably more images, a higher diversity in images, a better character controller, a higher complexity yield for json capacity, and a much larger complexity with json prompts.

    Subject Bucketing upgrade

    The bucketing system handles roaring fast speeds and a shared grab-bag capacity for buckets which both reduces prep time and still produces more images than the model can ingest on 4 gpus. The parquet processing pipeline processes images considerably faster and still handles AR bucketing at lightning speed, all because of the random grab-bag processing capacity of the parquet system.

    Improved Cache

    The original caching system is quite improved now, converted to parquet processing that easily capped the 4 a40 gpus with 100% processing.

    More Data

    A much larger train of 10,000 dual-prompted images. Repeats are based on both buckets and their subject selectiveness frequency.

    Suggested Use

    I suggest reduced strength which will still promote the lora's strength without introducing the QWEN biases as strongly.

    I've included trigger prompt assistance for using the built in subject format.

    Brent 1k (PREVIEW) Release

    https://github.com/AbstractEyes/anima-trainer

    Trained with the same trainer as Anima was trained with originally - diffusion-pipe, snapped together with a new dataset organization system so I could run it in either Runpod or notebooks.

    https://huggingface.co/datasets/AbstractPhil/diffusion-pretrain-set-ft1

    This is 1k images randomly sampled and subject-bucketed from the 80k image dataset "qwen_90k" that will be trained next.

    https://huggingface.co/AbstractPhil/Qwen3.5-0.8B-json-captioner

    Each of the images were captioned using the VLM's VIT for a JSON outputted system and additionally a variant of AnimeTIMM VIT also captioned and then processed into JSON as well.

    12 epochs on the VLM JSON captions, same images back in for 8 more epochs with AnimeTIMM JSON. This is the results from subject-bucketing with json.

    More specifically

    https://huggingface.co/blog/AbstractPhil/subject-bucketing

    This is a subject-bucket trained JSON finetune.

    The specific targets are meant to provide better accuracy and more fidelity to finetunes experimentally while simultaneously training a proof-of-concept paradigm related to subject-bucketing.

    TLDR Subject Bucketing

    Dataset, balancing. Normally you end up with a series of, problems from finetunes. Breakpoints, kinks, issues, distortions, faults, and so on.

    This is meant as an experiment to solve those exact problems. By finetuning a model with JSON, you provide a form of differentiated perspective to the AI. By grouping subjects to a more complex paradigm as stated in the article - the differentiation becomes robust.

    A little longer, still short.

    Each token separator is another format of language that QWEN already understands and recognizes. The more you combine in sequence, the more QWEN will understand this process - providing more utilizable structure to the diffusion system.

    With robust and orderly encodings provided to the diffusion system that include differentiated lesser-used tokens in conjunction with more common-use tokens, the more powerful the training results in useful outcomes.

    Why?

    The smaller-scale non-bucketed variants were successful, so it's time to train the real thing. The tool itself, and the tool yields.

    Now the first 1k image train for the direct tool has been successful. The results are yielding and powerful. This merits a full uptick in training.

    Description

    FAQ

    Comments (14)

    goh_Jun 25, 2026
    CivitAI

    is this like bringing the json prompt capabilities from ideogram v4 to anima?

    AbstractPhila
    Author
    Jun 25, 2026· 1 reaction

    I haven't played with Ideogram V4 but I've been planning this one for a couple months. My dataset consists of over 700k fully prepared dual-prompt images with my shared QWEN 3.5 0.8b model as the catalyst for the entire system.

    SDXL took to it like a bag of rocks, however Anima took it fairly clean.

    VKilkoJun 25, 2026
    CivitAI

    What exactly does Lora do? Can I just use it to generate prompts in JSON format? What exactly does that look like?

    AbstractPhila
    Author
    Jun 25, 2026

    It accepts plain English prompting as well as JSON prompting.

    VKilkoJun 25, 2026

    @AbstractPhila But if this Lora not for enhancing the JSON promptstructure understanding, what is the idea for it? For what is this?

    AbstractPhila
    Author
    Jun 25, 2026

    @VKilko The model becomes more selective with larger margins between the LLM inputs. The LLM itself isn't particularly very smart, so more sparse captions have trouble. This both strengthens small chains of tokens by giving them scaffolding with JSON, as well as trains subject symbolism from the LLM into the diffusion mechanism. Thus allowing the model to align to specifics in a different way, in this case JSON was the catalyst and plain English was the mechanism.

    AbstractPhila
    Author
    Jun 25, 2026

    @VKilko https://huggingface.co/datasets/AbstractPhil/anima-90k-cache/tree/main/vlm This will give a good idea if what's in there.

    Here is one with a viewer, same images.

    https://huggingface.co/datasets/AbstractPhil/sdxl-qwen-phase0

    N0n4m3Jun 26, 2026
    CivitAI

    @AbstractPhila What is structure / format of JSON?

    I did some testing and ... I can't see any difference with | without this LORA using Anima base.

    Modern models, surprisingly, do understand JSON, some more others less, i.e. using Anima gives 60/40 positive results but Krea2 jumps to 90/10.

    I used Ideogram JSON description from KJ and am surprised that this does work so well for Krea2, not ideal, but this is all "Ai" shtick these days ("good enough so we all should use it"), much better than in Anima.

    The most problematic part is bbox coordinates that Anima seams to ignore in i.e. 50/50.

    AbstractPhila
    Author
    Jun 26, 2026

    I haven't trained bounding box coordinates yet, you need to use difference offsets for now. "to the left of", "the upper right corner of the image", etc.

    AbstractPhila
    Author
    Jun 26, 2026

    The next structure I create will be substantially more powerful. I'm scaling up to full VIT classification capacity; text identification, rotation, offset, depth, scale, bounding boxes, and considerably more identified capacities all packed into JSON.

    In that sense I'm going to find the strongest VLM that can run on the rtx 6000 pro's 95 gigs of vram, and with that the version 2 will be considerably more powerful.

    Version 1 is currently cooking, and the subject semantics association preview shows that it will in fact yield - but my eyes are now open to something much much more powerful.

    BrewceJun 26, 2026
    CivitAI

    As the sample images doesn't show any JSON in their prompt, could you give us an example ?

    AbstractPhila
    Author
    Jun 26, 2026· 1 reaction
    [ { "subject": "type", "attributes":["attributes", "go here for the thing",] "setting":"location and settings", }, { ... more subjects } ]

    It's a bit barebones for now, but it'll get the model started for the next batch.

    AbstractPhila
    Author
    Jun 26, 2026

    There's an actual qwen model you can use to translate your plain english prompt directly to the json format that this model learned.

    https://huggingface.co/AbstractPhil/anima-prelim-1k-r64/tree/main/comfy-qwen-json

    The qwen node works in comfyui but I haven't packaged it up into it's own repo yet. It requires transformers >5.4

    I suggest appending the plain english + booru tags after the json formatted data, which provides the necessary solidity to the prompt.

    VKilkoJun 26, 2026
    CivitAI

    What do you think about xml as a input structure like NewbieAi have.
    Example prompt:
    <character_1>

    <n>$character_1$</n>

    <gender>1girl</gender>

    <appearance>chibi, red_eyes, blue_hair, long_hair, hair_between_eyes, head_tilt, tareme, closed_mouth</appearance>

    <clothing>school_uniform, serafuku, white_sailor_collar, white_shirt, short_sleeves, red_neckerchief, bow, blue_skirt, miniskirt, pleated_skirt, blue_hat, mini_hat, thighhighs, grey_thighhighs, black_shoes, mary_janes</clothing>

    <expression>happy, smile</expression>

    <action>standing, holding, holding_briefcase</action>

    <position>center_left</position>

    </character_1>

    <character_2>

    <n>$character_2$</n>

    <gender>1girl</gender>

    <appearance>chibi, red_eyes, pink_hair, long_hair, very_long_hair, multi-tied_hair, open_mouth</appearance>

    <clothing>school_uniform, serafuku, white_sailor_collar, white_shirt, short_sleeves, red_neckerchief, bow, red_skirt, miniskirt, pleated_skirt, hair_bow, multiple_hair_bows, white_bow, ribbon_trim, ribbon-trimmed_bow, white_thighhighs, black_shoes, mary_janes, bow_legwear, bare_arms</clothing>

    <expression>happy, smile</expression>

    <action>standing, holding, holding_briefcase, waving</action>

    <position>center_right</position>

    </character_2>

    <general_tags>

    <count>2girls, multiple_girls</count>

    <style>anime_style, digital_art</style>

    <background>white_background, simple_background</background>

    <atmosphere>cheerful</atmosphere>

    <quality>high_resolution, detailed</quality>

    <objects>briefcase</objects>

    <other>alternate_costume</other>

    </general_tags>

    LORA
    Anima

    Details

    Downloads
    70
    Platform
    CivitAI
    Platform Status
    Available
    Created
    6/25/2026
    Updated
    7/5/2026
    Deleted
    -

    Files

    qwen_anima_e20.safetensors

    Mirrors

    CivitAI (1 mirrors)

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.