CivArchive
    Joy CLIP (FLUX, PONY, Video Models) - CLIP-L
    NSFW
    Preview 87290230
    Preview 87290229

    Joy CLIP

    Read the Guide

    Note: The base CLIP-L does not show the stark improvement like the PONY CLIP models and may improve NSFW in some cases but not the 90+% like pony

    • NSFW Image Comparison

    • Using the CLIP in FP32 is recommended

    • Workflow and launch Tool to Replace CLIP

    • Comfy UI --fp32-text-enc (Or use the the linked above)

    • Forge/Auto1111 --clip-in-fp32 (Or use the the linked above)

    • Checkpoints with FP32 JoyCLIP built are hosted on HuggingFace they have not been altered except for the clip and the metadata includes attribution and license. These include:

    CyberRealistic, PonyRealism, RealismbyStableYogi


    Joy CLIP is the culmination of 100's of hours of training using 50KwH

    I do not consider a clip training to be successful unless out of 100 images the new clip (Joy) does not have seed to failure more then 5 times.

    A failure being a deformity, dual limb, something major wrong and the old clip does not have that issue.

    In that same 100 images the new clip (Joy) should show major improvement on 10-20 images out of 100, and minor improvement on 20-50.

    In most cases 90%+ Joy CLIP improves prompt accuracy, when accuracy is effected. Rarely 2% or less Standard CLIP outperforms JoyCLIP in hand accuracy or some other visual metric.

    I achieved these results on PONY, however FLUX and the Video models remain untested. As this requires 1000's of generations to average.

    License: MIT License

    Copyright (c) 2021 OpenAI

    Permission is hereby granted, free of charge, to any person obtaining a copy

    of this software and associated documentation files (the "Software"), to deal

    in the Software without restriction, including without limitation the rights

    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell

    copies of the Software, and to permit persons to whom the Software is

    furnished to do so, subject to the following conditions:

    The above copyright notice and this permission notice shall be included in all

    copies or substantial portions of the Software.

    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR

    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,

    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE

    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER

    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,

    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE

    SOFTWARE.

    Description

    FAQ

    Comments (21)

    roe2Jul 10, 2025· 3 reactions
    CivitAI

    Would you call this a successor to Pony 100k CLIP?

    Preliminary results are promising btw, great work!

    Felldude
    Author
    Jul 10, 2025· 3 reactions

    Thanks

    100k was the alignment back with the vision model, the dataset was 10x larger and more broad. This is a finetune of that finetune with a smaller dataset that could be hand-checked.

    I do consider it to be a improvement on 100k.

    HikariasJul 10, 2025
    CivitAI

    There is a way to replace a CLIP model on already existent model?

    banaj66727Jul 10, 2025· 2 reactions
    CivitAI

    Put this in an noticed an immediate improvement, fantastic work thank you

    Felldude
    Author
    Jul 10, 2025

    Thanks

    Dumcluck51Jul 10, 2025
    CivitAI

    I looked at the guide to replace clip and it made no sense to me so perhaps someone can clarify. Does this just replace Clip-L in the comfyui load clip node? Or does it require merging into a checkpoint?

    Felldude
    Author
    Jul 10, 2025· 1 reaction

    If using comfy your not required to save a new checkpoint you can just use the appropriate clip loader for your model, single for SD 1.5 , double for FLUX PONY and triple for SD 3.5

    Dumcluck51Jul 10, 2025· 1 reaction

    @Felldude Thanks. I'm clearly a novice so I'm not sure what you mean by single/double/triple? Does it work with other models such as SDXL and WAN2.1?

    Felldude
    Author
    Jul 10, 2025· 2 reactions

    @dcham2310 SD 1.5 uses CLIP-L only, SDXL uses CLIP-G and CLIP-L, PONY uses G&L, Flux uses T5 and L, etc - PONY CLIP only works with PONY, name clip can work with SDXL but made for pony, CLIP-L can work with FLUX, video models etc

    Dumcluck51Jul 10, 2025

    @Felldude Thanks for that - I'll give it a try. It seems to be something that's become essential.

    Felldude
    Author
    Jul 10, 2025· 1 reaction

    @dcham2310 I have tested the base CLIP-L the least, it showed some promise in FLUX, but is untested in Video models, as my primary focus was on PONY

    Dumcluck51Jul 10, 2025

    @Felldude Sorry, again I got lost. Clearly I don't understand the relationship between Unet and Checkpoint and Diffusion_Model. I tried the workflow you linked and it seemed to complete ok (once I had the model folder paths right) but nothing seems to have happened. I expected a saved model file in either my unet folder or diffusion_models folder but nothing appeared. I don't want to waste your time because of my inexperience but maybe there are others who, like me, need to have things spelled out.

    Felldude
    Author
    Jul 10, 2025

    @dcham2310 What is the adage about those who do and those who teach - Well I am a horrible teacher but I might be able to rewrite the article with more workflows per model

    Dumcluck51Jul 10, 2025

    @Felldude @Felldude Indeed - I found that throughout my career. The real smart people were not great at passing on their smarts. But then I am slow at picking things up - as I said, I need it to be spelled out. So I've just noticed your Pony Final Cut checkpoint which seems to include JOY. Am I missing anything by just using that?

    Felldude
    Author
    Jul 10, 2025· 1 reaction

    @dcham2310 No you are not. I will be releasing all versions of FinalCut with JOY, the current version is the highest quality outside of full FP32 which is used by very few people

    EricRollei21Jul 12, 2025· 3 reactions
    CivitAI

    Fascinating stuff, and good work!
    How hard is it to train the clips and how many different versions are there? Looked at your article from 2 days ago and honestly the clip is making more difference than the ckpts.

    Felldude
    Author
    Jul 12, 2025

    CLIP-L is fairly easy to train resource wise most people can train at 32 batch size in less then an hour for a 100k image/text pairs, the issue is more with over fitting and catastrophic loss. The clips for pony already had such loss so restoration was easier then trying to improve on CLIP-L. While PONY CLIPS had major loss they did have 100's or 1000's of new character tokens trained in and care had to be taken not to loose those.

    CLIP-G is beyond most users to train at 32 batch or above.

    Open CLIP trained the models at 79K batch and 32K batch which is unobtainable by anyone without a power plant.

    schschJul 17, 2025
    CivitAI

    Please, just two things I wanted to know to clarify better.
    1 - It doesnt work in Illustrious models, right? Just for SDXL and Pony? I have even made a try, it gaves a 'solid colored background with blue or green risks', like a 'painting mess'.
    2 - Can I use fp16? Thats because fp32 clips can be as high as 8gb. I have bigonly_bigaspv2ClipG with only 1.28gb (much more feasible).

    Felldude
    Author
    Jul 17, 2025· 1 reaction

    Correct not illusustrious as that clip model has embeddings - I have not test fp16, bf16 may be needed for the wider address

    Checkpoint
    Flux.1 D

    Details

    Downloads
    857
    Platform
    CivitAI
    Platform Status
    Available
    Created
    7/9/2025
    Updated
    6/12/2026
    Deleted
    -