Aesthetic Quality Modifiers - Masterpiece
Training data is a subset of all my manually rated datasets with the quality/aesthetic modifiers, including only the masterpiece tagged images.
ℹ️ LoRA work best when applied to the base models on which they are trained. Please read the About This Version on the appropriate base models, trigger usage, and workflow/training information.
Version 5.0 [anima-preview-3] (Latest)
(Temporarily including here as the "About This Version" section is having issues)
Trained on Anima Preview-3-base
Assume that any lora trained on the preview version won't work well on the final version.
Recommended prompt structure:
Positive prompt (quality tags at the start of prompt):
masterpiece, best quality, very aesthetic, {{tags}}, {{natural language}}Updated dataset of 386 images, all masterpiece tagged images trained in Kirazuri (Anima) model version 2 dataset.
Trained at 1024 x 1024, 1280 x 1280, and 1536 x 1024 resolutions.
Previews are mostly generated at 1536 x 1024 or 1024 x 1536 .
Training config:
diffusion-pipe commit b0aa4f1e03169f3280c8518d37570a448420f8be
# dataset-anima.toml
resolutions = [1024, 1280, 1536]
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 9
# Totals
# 386 images
# 15504 samples/epoch
# 153 images
# 48 samples/image - 7344 samples/epoch
[[directory]]
path = '/mnt/d/training_data/0_masterpieces_kirazuri/1536x1536'
repeats = 16
resolutions = [1024, 1280, 1536]
# 44 images
# 48 samples/image - 2112 samples/epoch
[[directory]]
path = '/mnt/d/training_data/0_masterpieces_kirazuri/1280x1280'
repeats = 24
resolutions = [1024, 1280]
# 189 images
# 32 samples/image - 6048 samples/epoch
[[directory]]
path = '/mnt/d/training_data/0_masterpieces_kirazuri/1024x1024'
repeats = 32
resolutions = [1024]
# anima-lora.toml
output_dir = '/mnt/d/anima/training_output/masterpieces-v5'
dataset = 'dataset-anima.toml'
# training settings
epochs = 5
# Per-resolution batch sizes
micro_batch_size_per_gpu = [[1024, 32], [1280, 24], [1536, 16]]
pipeline_stages = 1
gradient_accumulation_steps = 1
gradient_clipping = 1
warmup_steps = 100
lr_scheduler = 'cosine'
# misc settings
save_every_n_epochs = 1
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
map_num_proc = 8
steps_per_print = 1
compile = true
[model]
type = 'anima'
transformer_path = '/mnt/c/workspace/models/diffusion_models/anima-preview3-base.safetensors'
vae_path = '/mnt/c/workspace/models/vae/qwen_image_vae.safetensors'
llm_path = '/mnt/c/workspace/models/text_encoders/qwen_3_06b_base.safetensors'
dtype = 'bfloat16'
llm_adapter_lr = 1e-6
flux_shift = true
multiscale_loss_weight = 0.5
sigmoid_scale = 1.3
[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'
[optimizer]
type = 'adamw_optimi'
lr = 4e-5
betas = [0.9, 0.99]
weight_decay = 0.01
eps = 1e-8Description
[WAN 1.3B] LoRA (experimental)
Trained with diffusion-pipe on Wan2.1-T2V-14B with the same (image-only) dataset as v2.0-alpha [WAN 14b]
Currently curating a video dataset
Video previews generated with ComfyUI_examples/wan/#text-to-video
Loading the LoRA with LoraLoaderModelOnly node
Recommend following prompting guide for movement to avoid still images/jitter: https://www.comfyonline.app/blog/wan2-1-prompt-guide
FAQ
Comments (18)
when did people decide to not even explain anymore what their lora does? how is that any different from just sing quality tags the usual way?
This is a personal aesthetic LoRA finetune, it enhances the base models understanding of artistic styles and aesthetic qualities.
It uses manual curation and tagging to bias the general quality tags to that effect.
For a reference of what artistic styles and aesthetic qualities, see the other things I've trained on the site which have more concrete goals - like replicating a certain style.
@motimalu ah ok thx for the explanation
What exactly does wan 14b mean? = i2v_14b_480p and i2v_14b_720p and t2v_14b ?
Wan2.1-T2V-14B also in the "About this version", will update the version tag for clarity
Also yes it can be used with all the 14b models you listed including the i2v_14b_480p and i2v_14b_720p, but it was trained specifically on the t2v_14b
Aha. I see. Thank you. 👍👍👍
Do you find the background unrealistic (e.g., a door appearing in an open space) or notice stains on the walls? I recommend using V2.3 with a weight of 0.7, which can effectively resolve these issues without affecting the main subject's pose.
In use, epic results. Thanks for sharing with us!
Can you add these to tensor too? need img2img works.
would it be nice for new wan 2.2 models
Will try training the wan 2.2-5B, 2.2-14B seems to have some compatibility with 2.1-14B LoRA: https://civitai.com/posts/20217884
Could you please explain what this LoRA does in Image/video generation? Does it primarily improve video quality (e.g., resolution, sharpness) or the smoothness and realism of the animation itself? If it focuses on quality, what specific aspects does it enhance? For instance, does it target visual clarity, color grading, or adjustments to saturation, brightness, and contrast?
This LoRA is trained on a subset of all my datasets that are curated for different purposes - individual styles, concepts, characters etc. ultimately to be used in full-finetuning - like for this checkpoint.
The goal of this LoRA is to test and validate the manual tagging of quality tags applied during those datasets curation - in this case the "masterpiece" tag.
I could give some metrics I tried to follow while manually curating tens of thousands of images, but overall the result of that methodology is a personal bias that is trained - "masterpiece" is quite subjective after all.
Regarding the video generation models (Wan), I have only curated a large enough dataset to train this with images currently - so the result on animation is to have less movement, and a more illustration focused style which works better when using the model as text-to-image instead for generation.
(I do not generally train/curate any realism focused datasets, with only some exceptions like photo backgrounds.)
@motimalu That's super helpful, thanks for clarifying!😊
For dummies like me, I consider LoRAs like a flavour of ice cream, or a genre of music; if you ask a model to generate "fast music", you might get a rousing classical piece, thrash metal, drum and bass... LoRAs are specifically asking for "fast music - death metal".
Sort of.
That's my non-technical / clueless take on it, anyway. LoRAs give extra context for a model, like details or characters, and can be stacked and varied in strength




