Joy CLIP (FLUX, PONY, Video Models) - CLIP-L

NSFW

Joy CLIP

Read the Guide

Note: The base CLIP-L does not show the stark improvement like the PONY CLIP models and may improve NSFW in some cases but not the 90+% like pony

NSFW Image Comparison
Using the CLIP in FP32 is recommended
Workflow and launch Tool to Replace CLIP
Comfy UI --fp32-text-enc (Or use the the linked above)
Forge/Auto1111 --clip-in-fp32 (Or use the the linked above)
Checkpoints with FP32 JoyCLIP built are hosted on HuggingFace they have not been altered except for the clip and the metadata includes attribution and license. These include:

CyberRealistic, PonyRealism, RealismbyStableYogi

Joy CLIP is the culmination of 100's of hours of training using 50KwH

I do not consider a clip training to be successful unless out of 100 images the new clip (Joy) does not have seed to failure more then 5 times.

A failure being a deformity, dual limb, something major wrong and the old clip does not have that issue.

In that same 100 images the new clip (Joy) should show major improvement on 10-20 images out of 100, and minor improvement on 20-50.

In most cases 90%+ Joy CLIP improves prompt accuracy, when accuracy is effected. Rarely 2% or less Standard CLIP outperforms JoyCLIP in hand accuracy or some other visual metric.

I achieved these results on PONY, however FLUX and the Video models remain untested. As this requires 1000's of generations to average.

License: MIT License

Permission is hereby granted, free of charge, to any person obtaining a copy

of this software and associated documentation files (the "Software"), to deal

in the Software without restriction, including without limitation the rights

to use, copy, modify, merge, publish, distribute, sublicense, and/or sell

copies of the Software, and to permit persons to whom the Software is

furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all

copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE

AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,

OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE

SOFTWARE.

Description

FAQ

Comments (21)

roe2Jul 10, 2025· 3 reactions

CivitAI

Would you call this a successor to Pony 100k CLIP?

Preliminary results are promising btw, great work!

Felldude

Author

Jul 10, 2025· 3 reactions

Thanks

100k was the alignment back with the vision model, the dataset was 10x larger and more broad. This is a finetune of that finetune with a smaller dataset that could be hand-checked.

I do consider it to be a improvement on 100k.

HikariasJul 10, 2025

CivitAI

There is a way to replace a CLIP model on already existent model?

Felldude

Author

Jul 10, 2025

https://civitai.com/articles/15600/your-clip-matters-seed-to-seed-comparision-and-workflow

vhpAug 20, 2025

Dual Clip Loader works

banaj66727Jul 10, 2025· 3 reactions

CivitAI

Put this in an noticed an immediate improvement, fantastic work thank you

Felldude

Author

Jul 10, 2025

Thanks

Dumcluck51Jul 10, 2025

CivitAI

I looked at the guide to replace clip and it made no sense to me so perhaps someone can clarify. Does this just replace Clip-L in the comfyui load clip node? Or does it require merging into a checkpoint?

Felldude

Author

Jul 10, 2025· 1 reaction

If using comfy your not required to save a new checkpoint you can just use the appropriate clip loader for your model, single for SD 1.5 , double for FLUX PONY and triple for SD 3.5

Dumcluck51Jul 10, 2025· 1 reaction

@Felldude Thanks. I'm clearly a novice so I'm not sure what you mean by single/double/triple? Does it work with other models such as SDXL and WAN2.1?

Felldude

Author

Jul 10, 2025· 2 reactions

@dcham2310 SD 1.5 uses CLIP-L only, SDXL uses CLIP-G and CLIP-L, PONY uses G&L, Flux uses T5 and L, etc - PONY CLIP only works with PONY, name clip can work with SDXL but made for pony, CLIP-L can work with FLUX, video models etc

Dumcluck51Jul 10, 2025

@Felldude Thanks for that - I'll give it a try. It seems to be something that's become essential.

Felldude

Author

Jul 10, 2025· 1 reaction

@dcham2310 I have tested the base CLIP-L the least, it showed some promise in FLUX, but is untested in Video models, as my primary focus was on PONY

Dumcluck51Jul 10, 2025

@Felldude Sorry, again I got lost. Clearly I don't understand the relationship between Unet and Checkpoint and Diffusion_Model. I tried the workflow you linked and it seemed to complete ok (once I had the model folder paths right) but nothing seems to have happened. I expected a saved model file in either my unet folder or diffusion_models folder but nothing appeared. I don't want to waste your time because of my inexperience but maybe there are others who, like me, need to have things spelled out.

Felldude

Author

Jul 10, 2025

@dcham2310 What is the adage about those who do and those who teach - Well I am a horrible teacher but I might be able to rewrite the article with more workflows per model

Dumcluck51Jul 10, 2025

@Felldude @Felldude Indeed - I found that throughout my career. The real smart people were not great at passing on their smarts. But then I am slow at picking things up - as I said, I need it to be spelled out. So I've just noticed your Pony Final Cut checkpoint which seems to include JOY. Am I missing anything by just using that?

Felldude

Author

Jul 10, 2025· 1 reaction

@dcham2310 No you are not. I will be releasing all versions of FinalCut with JOY, the current version is the highest quality outside of full FP32 which is used by very few people

EricRollei21Jul 12, 2025· 3 reactions

CivitAI

Fascinating stuff, and good work!
How hard is it to train the clips and how many different versions are there? Looked at your article from 2 days ago and honestly the clip is making more difference than the ckpts.

Felldude

Author

Jul 12, 2025

CLIP-L is fairly easy to train resource wise most people can train at 32 batch size in less then an hour for a 100k image/text pairs, the issue is more with over fitting and catastrophic loss. The clips for pony already had such loss so restoration was easier then trying to improve on CLIP-L. While PONY CLIPS had major loss they did have 100's or 1000's of new character tokens trained in and care had to be taken not to loose those.

CLIP-G is beyond most users to train at 32 batch or above.

Open CLIP trained the models at 79K batch and 32K batch which is unobtainable by anyone without a power plant.

schschJul 17, 2025

CivitAI

Please, just two things I wanted to know to clarify better.
1 - It doesnt work in Illustrious models, right? Just for SDXL and Pony? I have even made a try, it gaves a 'solid colored background with blue or green risks', like a 'painting mess'.
2 - Can I use fp16? Thats because fp32 clips can be as high as 8gb. I have bigonly_bigaspv2ClipG with only 1.28gb (much more feasible).

Felldude

Author

Jul 17, 2025· 1 reaction

Correct not illusustrious as that clip model has embeddings - I have not test fp16, bf16 may be needed for the wider address

Checkpoint

Flux.1 D

by Felldude

Download (Beta) View on CivitAI