Balanced CLIP (1M)
Training CLIP-G took >15KwH of energy, CLIP-L took far less <1KwH
The full negative reinforcement (Cosine Dissimilarity) is available on my huggingface, this was paired with a positive reinforcement (Contrastive Loss) using the full frozen vision model in latent space.
PONY CLIP-L has a further 10 epochs using ASGD for very fine-tuned loss.
Description
FAQ
Comments (9)
Has anyone tested some of these and text within images? I've seen a number of my illu stuff getting viable text lately.
Base sdxl has fair text recognition, with pony and illustrious I’m not sure if it was the clip or attention training that caused the inability - this set of clips is not for illustrious though I am doing a clip-l for it
How does this differ from and stack up against your previous Clip models?
It is a larger training then all prior trainings, it should generalize far better but may not be as nsfw task oriented
Interesting, how's compatibility with IL?
It would be compatable with flux and sdxl but not illustrious - the clip-l is in training and will likely be up in a few days
@Felldude This is truly unfortunate news. In the realm of anime models, very few can avoid 'illustrious‘
@1q2w3e4rQAZ The Illustrious version is usable in PONY and the reverse is true but the base starting model effects the character triggers.
@Felldude Oh, I see. Now I understand.
Details
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.
