1 repeat, 100 epoches, saving each every 20 epoches, totalling 5 epoches. cosine with restarts, 5 total cycle, synchronizing with each saving epoches. 1e-4 unet learning rate, zero text encoder learning.
256/256 network/convolution dimension and alpha.
trained on illustrious 2.0 base model.
enabling random crop, original resolution artwork, tagged with wd14 swin v2 tagger v3. tagging threshold ~0.1 to 0.25
the key ingredient is an extra set of manual cropped dataset, the artwork is cropped to eye->face->portrait->upper body, in order of priority, depending on the available resolution, how large the image is.