ULtraReal (12GB) ft. Simv4 CLIP-G - V2_Hybrid

NSFW

ULtraReal (12GB) ft. Simv4 CLIP-G

Full Checkpoint do not load additional CLIP or VAE

This model uses Simulacrum CLIP at a custom weight
This model is slightly less realistic then SuperModel Edition but is more flexible with Anime and Manga
This model is excellent for Anime to realistic image to image.

ULtraReal 2 (SuperModel Edition)

This model uses FP32 Timestamp training
Realistic Faces and Characters across 100's of PONY trainings
Full FP32 precision (UNET can be downcast with no issues)

ULtraReal 8GB PONY

This model was trained using FP32 precision with a focus on realism
The hybrid version is 8GB but will run as fast as base PONY
FP32 CLIP is as fast and superior to FP16 in 99% of cases (CLIP is handled in CPU)
This model is intended for use with up-scaling (See images for workflow)

This model uses FP32 CLIP these commands should be used, this will not slow down your IT's unless you have very low system RAM

Comfy UI --fp32-text-enc
Forge/Auto1111 --clip-in-fp32

Version 1.0 is outdated and should not be used for most cases.

All images should be repeatable when loading the source image.
Note I normally try and credit image remix if you see "your" prompt in comment below and I will link your image.

Description

Improved Faces - Reduced UNET to BF16

FAQ

Comments (31)

A_Friendly_SpiderDec 24, 2024· 2 reactions

CivitAI

Semi unrelated question but I should ask, could part of my problems have been having CLIP-G and CLIP-L in the VAE/TE for the tests I ran which caused it to slow way the hell down?

luismanuelpavan328Dec 24, 2024

I think, the full model has baked in clips. Dont use it, only if you are on another checkpoint.

Felldude

Author

Dec 24, 2024

Yeah that is correct, you just use the default loader, If you have a 24GB video card you should have at least 32GB of ram, but lets say for some reason you have less system RAM then VRAM this could cause an issue with using CPU, in this very small case it could cause a slowdown and you should force GPU

A_Friendly_SpiderDec 24, 2024

@Felldude possible, I run a 3090, this model I'd run as is, but I was using both the Drop in CLIP G and L on the other model testing when I should be using just one?

Felldude

Author

Dec 24, 2024· 1 reaction

@A_Friendly_Spider try --gpu-only --highvram | With a 3090 you should not be slowing down at all, I have had 3 different XL models loaded with 2 different CLIP-G and and CLIP-L and not effected my IT's or time to render after the initial loading - You might have a bigger issue

luismanuelpavan328Dec 24, 2024

i have a 8gb card, no slowdown at all. Did you write --clip-in-fp32 in webui.bat?

A_Friendly_SpiderDec 24, 2024

@Felldude I suspect that to be the issue if if its loading to CPU/RAM and not vram for just the clip. I suspect it is just a forge/settings issue but will report back

A_Friendly_SpiderDec 24, 2024

@luismanuelpavan328 yes, I suspect it is more the other issue

A_Friendly_SpiderDec 24, 2024

a note for any forge users the current command is --always-gpu and cannot be used in conjunction with -always-high-vram.

A_Friendly_SpiderDec 24, 2024

@Felldude So, the actual generation speed is fine, baseline, but if I do all in gpu it says it is using up nearly 12gb of vram and going into some kind of slow mode. The real slow down is not in the steps process itself, but at the end of a image generation where it is doing some kind of big memory dump. Mind this is in forge. and it also is taking up huge amounts of cpu in the process. It straight up tells me it is 10x slower if it goes all GPU and it isn't clear why. It is also saying it can only use 12 of the 16 gb of vram for some reason.

Felldude

Author

Dec 24, 2024

@A_Friendly_Spider A 3090 should be able to do an image every few seconds, you likely have a much larger issue such as needing to update base CUDA package to 12.5 or 12.6 or a corrupted VENV that needs to be cleared and fresh installed

A_Friendly_SpiderDec 24, 2024

@Felldude I'll have to look into the former, but i've not even heard of the latter, any suggestions of where/how to trouble shoot that?

Felldude

Author

Dec 24, 2024

@A_Friendly_Spider https://developer.nvidia.com/cuda-downloads?target_os=Windows

Felldude

Author

Dec 24, 2024

@A_Friendly_Spider I would remove all previous versions of CUDA as they are backwards compatible (Unless your coding in NVCC) install 12.6 - restart, delete forge including the VENV and fresh install

A_Friendly_SpiderDec 25, 2024

@Felldude So I installed the latest CUDA but this is what it is doing:
Moving model(s) has taken 2.91 seconds

[Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 3183.15 MB ... Done.

[Unload] Trying to free 2986.90 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3182.27 MB ... Done.

[Memory Management] Target: KModel, Free GPU: 3182.27 MB, Model Require: 0.00 MB, Previously Loaded: 4897.05 MB, Inference Require: 1024.00 MB, Remaining: 2158.27 MB, All loaded to GPU.

Moving model(s) has taken 5.62 seconds

100%|##################################################################################| 45/45 [00:35<00:00, 1.27it/s]

[Unload] Trying to free 9689.43 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3049.85 MB ... Unload model JointTextEncoder Current free memory is 6188.77 MB ... Unload model KModel Done.

[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 11172.14 MB, Model Require: 319.11 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 9829.03 MB, All loaded to GPU.

Moving model(s) has taken 60.09 seconds

Felldude

Author

Dec 25, 2024

@A_Friendly_Spider I only use Forge for FLUX, comfy holds the models in RAM, VRAM but forge can cause unloading even when the model should be stored

A_Friendly_SpiderDec 25, 2024

@Felldude I was afraid that might be the case, welp, guess I just gotta figure out how to use my system into the prompt matric format instead of just pregenerate a txt file to run prompts off of

Felldude

Author

Dec 25, 2024

@A_Friendly_Spider Your IT's still look very low for a 3090 unless that is native 2k or a batch size of 8, you should be at 4-8 IT's

A_Friendly_SpiderDec 25, 2024

@Felldude 1212x912, Euler A, Align your Steps 32, with nothing else fancy going on.

azlanshivank171Dec 26, 2024

@A_Friendly_Spider I have 2it/sec with my 8gb 2070. So what is slow? I think its normal. If i load a new checkpoint a 5 batch 30 steps is 2.5 minute, with same checkpoint between 1.5 and 2 minutes.

And yes, forge always unload, maybe with 24gb you can force always gpu, i dont know.

Felldude

Author

Dec 26, 2024· 2 reactions

@azlanshivank171 Forge should not be force offloading without --always-offload-from-vram but I suspect this case that the 3090 is paired with too little system RAM, very few people are running 64GB or at minimum 48GB of system ram needed to run a 24GB video card smoothly

A_Friendly_SpiderDec 26, 2024

@Felldude Basically the conclusion I have arrived at, with the addition that the card version I have has less VRAM

azlanshivank171Dec 26, 2024

@Felldude I didnt understand that clearly: Forge should not be force offloading without --always-offload-from-vram.

Im not using any command, just dark mode and clip 32.

azlanshivank171Dec 26, 2024

@A_Friendly_Spider what?

A_Friendly_SpiderDec 26, 2024

@azlanshivank171 Yes, the model I was given is not a 16gb VRAM card, but a 12gb.

azlanshivank171Dec 26, 2024

@A_Friendly_Spider You said, you had a 3090

azlanshivank171Dec 26, 2024