Balanced CLIP (1M)

Training CLIP-G took >15KwH of energy, CLIP-L took far less <1KwH

The full negative reinforcement (Cosine Dissimilarity) is available on my huggingface, this was paired with a positive reinforcement (Contrastive Loss) using the full frozen vision model in latent space.

PONY CLIP-L has a further 10 epochs using ASGD for very fine-tuned loss.

Description

FAQ

Comments (9)

A_Friendly_SpiderSep 30, 2025

CivitAI

Has anyone tested some of these and text within images? I've seen a number of my illu stuff getting viable text lately.

Felldude

Author

Sep 30, 2025

Base sdxl has fair text recognition, with pony and illustrious I’m not sure if it was the clip or attention training that caused the inability - this set of clips is not for illustrious though I am doing a clip-l for it

stygianwizard42Sep 30, 2025· 1 reaction

CivitAI

How does this differ from and stack up against your previous Clip models?

Felldude

Author

Sep 30, 2025· 2 reactions

It is a larger training then all prior trainings, it should generalize far better but may not be as nsfw task oriented

liftweightsSep 30, 2025

CivitAI

Interesting, how's compatibility with IL?

Felldude

Author