ULtraReal (12GB) ft. Simv4 CLIP-G
Full Checkpoint do not load additional CLIP or VAE
This model uses Simulacrum CLIP at a custom weight
This model is slightly less realistic then SuperModel Edition but is more flexible with Anime and Manga
This model is excellent for Anime to realistic image to image.
ULtraReal 2 (SuperModel Edition)
This model uses FP32 Timestamp training
Realistic Faces and Characters across 100's of PONY trainings
Full FP32 precision (UNET can be downcast with no issues)
ULtraReal 8GB PONY
This model was trained using FP32 precision with a focus on realism
The hybrid version is 8GB but will run as fast as base PONY
FP32 CLIP is as fast and superior to FP16 in 99% of cases (CLIP is handled in CPU)
This model is intended for use with up-scaling (See images for workflow)
This model uses FP32 CLIP these commands should be used, this will not slow down your IT's unless you have very low system RAM
Comfy UI --fp32-text-enc
Forge/Auto1111 --clip-in-fp32
Version 1.0 is outdated and should not be used for most cases.
All images should be repeatable when loading the source image.
Note I normally try and credit image remix if you see "your" prompt in comment below and I will link your image.
Description
Improved Faces - Reduced UNET to BF16
FAQ
Comments (31)
Semi unrelated question but I should ask, could part of my problems have been having CLIP-G and CLIP-L in the VAE/TE for the tests I ran which caused it to slow way the hell down?
I think, the full model has baked in clips. Dont use it, only if you are on another checkpoint.
Yeah that is correct, you just use the default loader, If you have a 24GB video card you should have at least 32GB of ram, but lets say for some reason you have less system RAM then VRAM this could cause an issue with using CPU, in this very small case it could cause a slowdown and you should force GPU
@Felldude possible, I run a 3090, this model I'd run as is, but I was using both the Drop in CLIP G and L on the other model testing when I should be using just one?
@A_Friendly_Spider try --gpu-only --highvram | With a 3090 you should not be slowing down at all, I have had 3 different XL models loaded with 2 different CLIP-G and and CLIP-L and not effected my IT's or time to render after the initial loading - You might have a bigger issue
i have a 8gb card, no slowdown at all. Did you write --clip-in-fp32 in webui.bat?
@Felldude I suspect that to be the issue if if its loading to CPU/RAM and not vram for just the clip. I suspect it is just a forge/settings issue but will report back
@luismanuelpavan328 yes, I suspect it is more the other issue
a note for any forge users the current command is --always-gpu and cannot be used in conjunction with -always-high-vram.
@Felldude So, the actual generation speed is fine, baseline, but if I do all in gpu it says it is using up nearly 12gb of vram and going into some kind of slow mode. The real slow down is not in the steps process itself, but at the end of a image generation where it is doing some kind of big memory dump. Mind this is in forge. and it also is taking up huge amounts of cpu in the process. It straight up tells me it is 10x slower if it goes all GPU and it isn't clear why. It is also saying it can only use 12 of the 16 gb of vram for some reason.
@A_Friendly_Spider A 3090 should be able to do an image every few seconds, you likely have a much larger issue such as needing to update base CUDA package to 12.5 or 12.6 or a corrupted VENV that needs to be cleared and fresh installed
@Felldude I'll have to look into the former, but i've not even heard of the latter, any suggestions of where/how to trouble shoot that?
@A_Friendly_Spider https://developer.nvidia.com/cuda-downloads?target_os=Windows
@A_Friendly_Spider I would remove all previous versions of CUDA as they are backwards compatible (Unless your coding in NVCC) install 12.6 - restart, delete forge including the VENV and fresh install
@Felldude So I installed the latest CUDA but this is what it is doing:
Moving model(s) has taken 2.91 seconds
[Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 3183.15 MB ... Done.
[Unload] Trying to free 2986.90 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3182.27 MB ... Done.
[Memory Management] Target: KModel, Free GPU: 3182.27 MB, Model Require: 0.00 MB, Previously Loaded: 4897.05 MB, Inference Require: 1024.00 MB, Remaining: 2158.27 MB, All loaded to GPU.
Moving model(s) has taken 5.62 seconds
100%|##################################################################################| 45/45 [00:35<00:00, 1.27it/s]
[Unload] Trying to free 9689.43 MB for cuda:0 with 0 models keep loaded ... Current free memory is 3049.85 MB ... Unload model JointTextEncoder Current free memory is 6188.77 MB ... Unload model KModel Done.
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 11172.14 MB, Model Require: 319.11 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 9829.03 MB, All loaded to GPU.
Moving model(s) has taken 60.09 seconds
@A_Friendly_Spider I only use Forge for FLUX, comfy holds the models in RAM, VRAM but forge can cause unloading even when the model should be stored
@Felldude I was afraid that might be the case, welp, guess I just gotta figure out how to use my system into the prompt matric format instead of just pregenerate a txt file to run prompts off of
@A_Friendly_Spider Your IT's still look very low for a 3090 unless that is native 2k or a batch size of 8, you should be at 4-8 IT's
@Felldude 1212x912, Euler A, Align your Steps 32, with nothing else fancy going on.
@A_Friendly_Spider I have 2it/sec with my 8gb 2070. So what is slow? I think its normal. If i load a new checkpoint a 5 batch 30 steps is 2.5 minute, with same checkpoint between 1.5 and 2 minutes.
And yes, forge always unload, maybe with 24gb you can force always gpu, i dont know.
@azlanshivank171 Forge should not be force offloading without --always-offload-from-vram but I suspect this case that the 3090 is paired with too little system RAM, very few people are running 64GB or at minimum 48GB of system ram needed to run a 24GB video card smoothly
@Felldude Basically the conclusion I have arrived at, with the addition that the card version I have has less VRAM
@Felldude I didnt understand that clearly: Forge should not be force offloading without --always-offload-from-vram.
Im not using any command, just dark mode and clip 32.
@A_Friendly_Spider what?
@azlanshivank171 Yes, the model I was given is not a 16gb VRAM card, but a 12gb.
@A_Friendly_Spider You said, you had a 3090
@A_Friendly_Spider You bought it from chinese market? 12gb 3090, thats great:))))))))))))
@A_Friendly_Spider It can be a 3060 12gb:)))))))))) If thats the case, your speed is totally normal.
why you use hi-res in your images? I would like to know what the RAW model can do. ;)
The training data has the 1.0 model Raw data compared to Base Pony
I posted the images from the prompts from Lizardon1025