The Segmind Stable Diffusion Model (SSD-1B) is a distilled 50% smaller version of the Stable Diffusion XL (SDXL), offering a 60% speedup while maintaining high-quality text-to-image generation capabilities. It has been trained on diverse datasets, including Grit and Midjourney scrape data, to enhance its ability to create a wide range of visual content based on textual prompts.
SSD-1B is fully compatible with deforum, comfy, and diffusers.
AUTOMATIC1111 compatible safetensors can be downloaded from here.
Try out the model at Segmind SSD-1B for ⚡ fastest inference.
Description
v1
FAQ
Comments (43)
Will it work with all XL LoRA, Embeddings?
It does but the output quality is not that great. Its better to train LoRAs natively on this model.
just fyi, auto1111 has now implemented support for this model in its dev branch
What sorts of resolutions are ideal for this model, I assume 1024x1024, and does it have a baked vae?
Yes, 1024x1024 is ideal. VAE is not baked. You can use anything you want like tiny VAE.
@harish Thanks!
感謝分享!
Haven't used any VAE but Low end machine reporting in:
It's way faster than SDXL, 6s/it on generation but most of the time taken is for VAE decoding which i guess my system memory is bottlenecking here as of my other hardware parts :D
It loads fast, Generate fast and decodes slow.
Eithery way thanks for the model, i refuse to upgrade since i believe in you guys for making it easier for low end machines.
Specs:
i5 3570k - 8GB DDR3 1333MHz - GTX 970 4GB (3.5GB) - slow mechanical HDD -
I will share two images i generated (one in different resolutions) in a moment
Use TAESD for a MUCH faster and lighter decoding. You can regenerate any image you liked with the normal VAE later.
@lolomind Thank you very much, i will try it but don't know where to get it.
is it called "Tiny AutoEncoder for Stable Diffusion" from github ?
It has a 4.7MB file for taesd_decoder.pth decode and taesd_encoder.pth for encode, right ?
@lolomind Also there is "vae_approx" folder in comfyui and it has all ds and sdxl version of TAE in them, what is that ?
Egads, how are those gpu & hd still alive?
@infernahermit846 those sound like the correct files. But if you use AUTO1111 there's a VAE option in the Settings tab where you can choose between the normal VAE and TAESD, so you don't need to download anything else.
If you use another UI, you may need an extension or node to use it, perhaps.
The vae_approx folder in ComfyUI is for the previews. You can also use TAESD for high quality previews if you put those files in there. (See the main page of ComfyUI on GitHub for more info about this.)
@firemanbrakeneck They are still capable tho.
@lolomind Oh thank you. comfy has VEA loader nod, i used TAESD but results were black final image.
@infernahermit846 that's because TAESD isn't a normal VAE... That's why I mentioned before about needing special nodes or extensions.
AUTOMATIC1111, VLAD and a few others already have the option to use it built-in, though.
@lolomind Awesome, Thank you man <3
I should just be able to grab the a1111 file and load it right? Do I need to do anything else? I've used lots of sdxl models and lora but this one either complains about the NaNs (I've got the fp16 fix vae going but still breaks anyway) or various other errors. It just refuses to work.
Please use the dev branch. The main branch is currently not supporting the model, but will likely support it once a1111 is updated.
Same here. I prefer to keep using the main branch and just wait.
@vishnujaddipal644 Theeeere we go. :) Thanks for the help. euler 20 steps on at 1024 with a 3080-10gig went from 12 seconds to 6. It let me stop using --medvram as well. I wish it was with the model that I usually use, but it's definitely fun to play around with. Good stuff and thanks.
It gives ugly results than SDXL.
Can you please share some comparisons so that we can improve the model.
First of all I want to say congratulations for the work done. I have tested the model and believe, after some testing, that it is a model with a manual gearbox; I'll explain myself: to obtain good results you will have to pay much more attention to the prompt, in some ways it is really you who gives the correct directives for a result while with the original SDXL model, the AI takes much more freedom by giving better results with more generic prompts. With SSD-1B a very vague prompt can lead to bad results but a well written prompt gives great results. Let me give you an example: if I want to describe a girl's full body and I put a beautiful face in the text, it is very likely that you will have a close-up of her face. You have to start from the general level to refine the prompt with details and corrections. Cons: the work takes longer and therefore what you gain in generation speed you lose in creating the prompt; PRO: Once you understand how to talk to the model, you will generate great things and decrease the writing time so you can take advantage of the greater speed of the SSD-1B. These are my first considerations. Forgive the English but it is the result of an automatic translation from Italian. Thanks again for the wonderful work, it's a great benefit on my slow PC.
I´m trying to use this model in ComfyUI using a MBP m1 8gb, it´s way faster than Base XL but the image comes out very noisy. I can see the end result but there´s something just wrong. Any ideas?
First thing is make sure you've updated ComfyUI recently. Support only came in a couple of weeks ago.
I'd also recommend keeping to the base nodes in case there's a custom node that hasn't been updated in some way.
Apart from that. I've had good results with 25 steps of DPM2++ 2m SDE with Karras, with no other changes over a standard SDXL flow (e.e. latent images sized to the normal SDXL sizes).
CFG is lower, 2? 4? check if is like lcm models, this is the best that happened to people around the globe, because most don't have upgraded to 8GB video cards yet. Is working with 6 on Fooocus 1.1.791. And is fantastic.
Have to say, I'm impressed. The images don't "pop" as much as the SDXL base model, but I think a lot of that is just down to colour palette / contrast levels. Maybe also the size of the subject matter in frame. I don't think it's really about the "quality" of the image.
I'll be watching this space with interest, especially as we now have LCM Loras for it as well.
Is there no inpainting checkpoint?
I tested SSD-1B.safetensors in Fooocus version 2.1.823, and it works as intended, with the speed and quality sampling settings.
The "extreme speed" setting does not work (Fooocus implementation of LCM), but it compensates that the generation speeds are much faster than those of standard SDXL models.
Image prompting works, variations and upscale also, outpainting/inpainting does not.
Outstanding model, keep up the good work.
Which are your settings for this model in Fooocus please ?
Will normal LoRAs work with this, and more importantly, will the LCM or Turbo or LCM+Turbo LoRAs work with this?
the answer to all of the above, from some quick testing, is yes.
this alone or sdxl-turbo is the faster ?
@amazingbeauty all by itself this is pretty fast, but i didn't like the quality. When i tried turbo or LCM LoRA with this, I didn't really notice any further speedup, but it didn't fail or crash. Overall I don't use SSD-1B at all, I just didn't find the quality acceptable
@shapeshifter83 sure we need quality , ok thanks.
the DPO LoRA seems to be nothing but positive when connected to this checkpoint. Slight speed up, slight quality improvement, and of course DPO's prompt obedience. From about an hour of testing. Not sure I can find any reason whatsoever not to simply use the DPO LoRA with SDXL Distilled every single time.
Incredible! Using a 12GB RTX 3060 and can generate 1024x1024 SDXL images with just 4GB of memory at 2it/s!
the best model I have used ever, very fast, very compatible, no matter what sampling methods you choose, how may steps you set, you always can get a good quality imaagines. normally, 15steps, it is enough.
not so very compatible. Seems it is incomatible with IPAdapter and ControlNet together









