Aesthetic Quality Modifiers - Masterpiece
Training data is a subset of all my manually rated datasets with the quality/aesthetic modifiers, including only the masterpiece tagged images.
ℹ️ LoRA work best when applied to the base models on which they are trained. Please read the About This Version on the appropriate base models, trigger usage, and workflow/training information.
Version 5.0 [anima-preview-3] (Latest)
(Temporarily including here as the "About This Version" section is having issues)
Trained on Anima Preview-3-base
Assume that any lora trained on the preview version won't work well on the final version.
Recommended prompt structure:
Positive prompt (quality tags at the start of prompt):
masterpiece, best quality, very aesthetic, {{tags}}, {{natural language}}Updated dataset of 386 images, all masterpiece tagged images trained in Kirazuri (Anima) model version 2 dataset.
Trained at 1024 x 1024, 1280 x 1280, and 1536 x 1024 resolutions.
Previews are mostly generated at 1536 x 1024 or 1024 x 1536 .
Training config:
diffusion-pipe commit b0aa4f1e03169f3280c8518d37570a448420f8be
# dataset-anima.toml
resolutions = [1024, 1280, 1536]
enable_ar_bucket = true
min_ar = 0.5
max_ar = 2.0
num_ar_buckets = 9
# Totals
# 386 images
# 15504 samples/epoch
# 153 images
# 48 samples/image - 7344 samples/epoch
[[directory]]
path = '/mnt/d/training_data/0_masterpieces_kirazuri/1536x1536'
repeats = 16
resolutions = [1024, 1280, 1536]
# 44 images
# 48 samples/image - 2112 samples/epoch
[[directory]]
path = '/mnt/d/training_data/0_masterpieces_kirazuri/1280x1280'
repeats = 24
resolutions = [1024, 1280]
# 189 images
# 32 samples/image - 6048 samples/epoch
[[directory]]
path = '/mnt/d/training_data/0_masterpieces_kirazuri/1024x1024'
repeats = 32
resolutions = [1024]
# anima-lora.toml
output_dir = '/mnt/d/anima/training_output/masterpieces-v5'
dataset = 'dataset-anima.toml'
# training settings
epochs = 5
# Per-resolution batch sizes
micro_batch_size_per_gpu = [[1024, 32], [1280, 24], [1536, 16]]
pipeline_stages = 1
gradient_accumulation_steps = 1
gradient_clipping = 1
warmup_steps = 100
lr_scheduler = 'cosine'
# misc settings
save_every_n_epochs = 1
activation_checkpointing = true
partition_method = 'parameters'
save_dtype = 'bfloat16'
caching_batch_size = 1
map_num_proc = 8
steps_per_print = 1
compile = true
[model]
type = 'anima'
transformer_path = '/mnt/c/workspace/models/diffusion_models/anima-preview3-base.safetensors'
vae_path = '/mnt/c/workspace/models/vae/qwen_image_vae.safetensors'
llm_path = '/mnt/c/workspace/models/text_encoders/qwen_3_06b_base.safetensors'
dtype = 'bfloat16'
llm_adapter_lr = 1e-6
flux_shift = true
multiscale_loss_weight = 0.5
sigmoid_scale = 1.3
[adapter]
type = 'lora'
rank = 32
dtype = 'bfloat16'
[optimizer]
type = 'adamw_optimi'
lr = 4e-5
betas = [0.9, 0.99]
weight_decay = 0.01
eps = 1e-8Description
[WAN 14B] LoRA (experimental)
Trained with diffusion-pipe on Wan2.1-T2V-14B with the same (image-only) dataset as v2.3 [noobai v-pred]
Currently curating a video dataset
Video previews generated with ComfyUI_examples/wan/#text-to-video
Loading the LoRA with LoraLoaderModelOnly node and using the fp8 14B: wan2.1_t2v_14B_fp8_e4m3fn.safetensors
Higher quality previews use the full fp16 14b: wan2.1_t2v_14B_fp16.safetensors
Recommend following prompting guide for movement to avoid still images/jitter: https://www.comfyonline.app/blog/wan2-1-prompt-guide
Image previews generated with modified ComfyUI_examples/wan/#text-to-video
Setting the frame length to 1
Adding Upscaling
Better results with text-to-image than text-to-video for this version (due to training on images only)
FAQ
Comments (4)
is it fine to use v-pred version lora at eps checkpoint?
It's fine yes, but not sure if I would recommended it - you may notice a saturation and contrast increase similar to when using v-pred checkpoints
1.3B model for v2v por favor
Released a 1.3B version here: https://civitai.com/models/929497?modelVersionId=1697670















