This is the official DMD2 implementation LoRA from the Improved Distribution Matching Distillation for Fast Image Synthesis paper.
dmd2_sdxl_4step_lora
Original model at Huggingface tianweiy/DMD2.
Recommended Settings
Steps: 8
LoRA weight: 0.7
Sampler: LCM
Scheduler: Normal, Simple
CFG: 1.4 - 1.8
I use similar settings on both a low resolution base image, and the upscaled refinement pass (hires fix). The settings above are my personal recommendations, but I suggest you play around with the settings to find what works for you and the model you are using.
Combining Both LoRAs
It's also been reported to perform better to use both the fp32 and fp16 at the same time at 0.5 weight each.
Recommended LoRA weight
Load the LoRA like any other. I personally recommend a weight of 0.7, but others swear it's better at 1.0 in weight.
Usage
You should be able to use it with any SDXL or similar checkpoint. Use the settings above.
You can also merge this into a checkpoint and save that, to include the result of the LoRA into the checkpoint directly. This means you no longer need to load this as a LoRA, and you can still use it with LCM and low steps for good result.
The LoRA itself doesn't add any speed improvement, it just allows you to generate at fewer steps, with better result.
If the results are washed out, desaturated or bland, try increasing the CFG.
In the comments below there are a bunch of interesting discussions, so please explore them. Here are a few noteworthy:
Licence
Creative Commons Attribution Non Commercial 4.0
I am not the creator of the model, just uploading the model here to make it easier to download and reference in creations. Full credit goes to Tianwei Yin.
Description
dmd2_sdxl_4step_lora
dmd2_sdxl_4step_lora_fp32
FAQ
Comments (94)
For DMD models, does that mean that DMD lora is already built in, and if so, would using this conflict? Or do I need to use this lora only with DMD checkpoints?
That's a good question. I'll clarify in the description.
For checkpoints with DMD2 in their name, it is very likely that this LoRA is merged into the model. This means that you can generate in low steps with that checkpoint/base model already. So you do not need this LoRA for that.
You can add this LoRA on top if you wish, but it will likely cause worse result. As it's a LoRA, you can add it at low weights, like <lora:dmd2_sdxl_4step:0.2> for example. You can try it and see which results you prefer.
The LoRA itself doesn't add any speeds, it just allows you to generate at fewer steps, with better result.
I hope this explains it, let me know if you have further questions.
@6tZ There's also a small trick you can do, if you do the fp16 and full version LORA's at the same time at 0.5 strength, you essentially get a 1 strength version of DMD2, except with different results, I think it actually increases coherency.
@plk Hmm, how do you mean you get a 1 strength version?
Do you get the same result?
6tZ Just came across this great lora and made a demo post of what plk described here. First pic is no lora. 2nd and 3rd pics are the fp32 and fp16 at strength 1 which look the same. 4 is both loras at 0.5 and the result is different but same quality as 2nd & 3rd. 5th and 6th are fp32 & fp16 at 0.5 separately which look alike but are different from 1-4.
guythis31773 I see. A slight typo in your explanation, you use number 4 twice. The last bit should be image "5th and 6th". But good showcase, thanks for sharing! Would be good to put it on the post as well in case people run across it.
I've added the suggestion to the model's page.
plk Great find, makes for some nice variation options! Going to try mix and matching different strengths between the 2 as well!
I really like the Fact, to save Time, and having a good Quality...but, with a Strength of 0.7, it adds pesky Tanlines to each Char. I tried to get rid of those, using Loras, promting (positive and negative)...nothing worked. Then I reduced the Strength to 0.5 ... that worked, the Tanlines are gone. (almost) ;-)
You should always use it at 1.0 strength otherwise you might get low quality images. The 0.7 in the description above is not a good recommendation. It does not work like normal loras.
@SubtleShader From my tests at 1 strength it added way too much crunch to the images in a very unappealing way.
I guess to each their own.
With a 2-step workflow I get really fantastic results at 0.7.
@6tZ Speed loras are meant to be used at 1.0 lora strength. If not, the speed aspect is reduced and non-speed generation is mixed in, which would mean having to use more steps. I'm doing photorealistic images and 1.0 works best. Below 1.0 images tend to soften. The only reason to do that would be to use cfg above 1.0. I try to avoid it as I got too many low quality results with my Big Love checkpoint. But it can work out, especially if you don't mind softer skin or if a checkpoint does not manage to produce sharp skin texture anyway.
@SubtleShader I just don't see what you're saying.
The speed does not change from the LoRA, the speed comes from the low step count and the LCM scheduler.
The LoRA brings in faster convergence as it's trained on fewer-step-generations at it's core.
You are right that below 1 "softens". But you could also say that "above 0.7 makes it too noisy/crunchy".
This is why I'm saying that it's a matter of desired outcome.
When I go to 1 in weight, the image gets a lot more details, like adding a "detailer" model, but it's often not very aesthetically pleasing details. It basically goes too far.
I think 1.0 is a good place to start, but if you find it too noisy, don't be afraid to dial it down. What matters is the output image and if you are happy with the result.
I have tried hundreds of images at different LoRA weight settings for this one, and my aesthetic sweet spot was at 0.7, which is why I'm recommending it.
I haven't tested it with all checkpoints of course, but the ones I sampled, this was the case.
CFG scale tone it down to 1.0 - 1.5, if you go above it burns the image, hence the tan lines LOL, leave the lora weight at 1.0,
I tried to use this lora in Cofyui and it did nothing, on 4 steps it gave just ordinary generation with poor quality. Maybe I did something wrong? It must be used like ordinary lora?
Yes. You just load it like an ordinary LoRA, and then use LCM as the sampler, and with low step count.
Yo, apparently there's a handy-dandy 'enhancer' for DMD2 available:
https://github.com/ZichenMiao/Pairwise_Sample_Optimization
Unfortunately the author only released it as a diffusers LoRA, however, that's one little "convert_diffusers_sdxl_lora_to_webui" script away, or just a download:
https://www.dropbox.com/scl/fi/5qwbpz2yls2edyotf93mw/sdxl_dmd2_PSO_webui.safetensors?rlkey=85ng5frgtisbmvkhy5wr8bs4k&st=pktgppgq&dl=1
Usage is loading the DMD2 LoRA as per usual at its 0.66-1.0 strength, then loading this PSO LoRA at whichever strength you so desire (1.0-3.0 seem to work?)
and suddenly, outputs are somewhat better
cool
still not exactly 100% sure what this is supposed to do since it's difficult to benchmark it, but at the very least, it seems to not do anything detrimental. Thanks for providing the converted lora, maybe you should upload it to civit?
DevilSShadoW Well, if one reads the paper - one would discover that it's actually a training technique for distilled/fewstep models. The provided LoRA is just some weird tweak-tweak likely for the sdxl-dmd2 checkpoint itself, but it has a mildly positive impact even outside of that application.
I would prefer to keep this account mostly useless, hehe.
VeerGeer I could upload it but I don't understand it. So I'm leaving it to the tech-savvy people :)
6tZ its just DPO tweaked to work for 1 to 4 step models like SDXL Turbo and DMD2. Regular DPO and SPO don't have much effect at such low steps: meaning slightly better prompt alignment and a lot more detail in each image. Although With your settings of 8 steps and .7 strength it might have surprising effects.
Funny thing this does, it makes 2d/anime into 2.5/3D, especially clothing items and hands/limbs. I can consistently reproduce this. Adversely, going below 0 will make anime images 2D/flat again. Interesting.
DevilSShadoW cool
Thank you for this....It made my images wayyy better!
any obvious downsides to using the FP16 version instead of FP32 one?
The FP32 one is like sample rates above 48 KHz in audio editing - mostly there as extra cushioning for the mixing process to avoid rounding error accumulation.
The non-commercial use license makes this worse than Hyper to me. And it irks me that the author brags that Adobe made their own implements of DMD2 for commercial use (by implementing the paper itself, to bypass the model license). but because most of the community don't have the resources of a billion dollar company we are restricted to the non-commercial use model. Great job. The only good thing I took away was learning about PSO (which has a permissive license) to try with Hyper.
Its a lot of rage just to sell ai porn slop...
And how should i use with hiresfix? I cant find a setting, which works.
I use CFG 1.5 with 8 steps, LCM scheduler, at 0.7 weight
@6tZ Im asking about upscaling with hiresfix. This setting is for generating. Or, if you have any good upscaling method, what fixing eyes, you can share it.
@bryanjameer918 Pretty much each image I upload has the original ComfyUI workflow baked into it.
I also wrote this article and created "simpler" workflows for people:
https://civitai.com/articles/17080/simple-comfyui-sdxl-pony-illustrious-flux-workflow
I use about the settings I described above for my upscale. It was from memory though so maybe it's a little bit off.
But you can see from that article, the Upscale-column of nodes:
8 steps, CFG 1.8, LCM sampler, Simple scheduler, 0.3 denoise.
This is for the simple workflow.
On my "full" workflow I have a FaceDetailer node with ~0.5 denoise, and the same for hands at 0.25.
@6tZ Thanx. I will try it.
@bryanjameer918 Try drag/dropping any of the generated images in this model, you'll get the workflow.
@6tZ Ok, LCM with simple works. I used with 0.25 denoise(the difference is huge). Thanx.
For generating exponential, karras or AYS is better, than simple. Thanx again.
@bryanjameer918 I find that low step counts works really well with LCM.
@6tZ Man, i use the lora at 1 strenght, and the hires fix CFG, which matters the most(1, max 1.1, the rest will be overbaked looking). Now, its good:)
@bryanjameer918 Yeah I think that works as well. I find that it gets a bit crunchy and overcooked with 1 in weight, for my personal tastes/use. But that's with my own models, which may act differently etc. I prefer the soft look of a lower weight.
it's decent, licensing model threw people off
I noticed 2 things: DMD2 seems to not just be fast, but also improve prompt adherence (not sure, though). And using DMD2 with 32 steps produces much worse results than with 8 steps.
i use it all the time at cfg 1, 9 steps
a little hard to prompt but this way i get 4 pics / m (1152x768) on a rx6400 4gb and the results are great
for me it works really well. For top speed im using cfg 1 and 4 steps and for best result with still fast speed cfg 1,5 and 6 steps. Though lora weight works best for me at 0,3. The recommended 0,7 gives me far worse results
Works well, i personally use it with 10 steps cfg 1.5, for realism is better to use with 1.0 strength, lower than this makes the generations looks flat and soft, for 3d and anime/cartoon is better with 0.7 strength.
Any guide on how to make it work using Civitai's built in generator?
I think you would just add it, using the settings below? I haven't tried it though.
I meant, the results aren't really optimized, or I feel like it.
@longrandomusername I see. That's fair, but no, I have no insights to give, sorry.
If you make any good findings, let me know and I can update the post.
man both your loras is the best for upscale its fixed quality is impressive thanks so much
Thanks. It's not my LoRA. I just uploaded to Civit. But it sure is a good one. It's mostly about the speed though.
@6tZ ok, so thank you for posting it here, these help me very much
I've get better result (texture) using Euler A with Karras or SGM Uniform.
Excellent with the BIG LOVE checkpoint. Really fast. Nice quality. Thank you.
How did you get this to work! I'm trying BigLove Pony and the dmd lora gives me crazy werid results. I'm setting lora at 0.7 or 1, have all the sampler and settings correct, but it comes out crazy... Can you not use extra loras when using dmd?
I am using Draw Things on a Mac and a compatible checkpoint (BigLove by SubtleShader), with LCM sampler and 8-10 steps. DMD2 LORA at 100%, or each version at 50%. CFG scale at 1 everytime with this configuration. Works like a charm.
Fun fact, but you can use this DMD2 LoRA at negative values for checkpoints that recommend LCM to allow for using higher CFG, higher steps and older samplers like Euler A. Very handy!
Could you please be more specifc?
@twobladessword398 For how to use? If a model you're using is designed for DMD2/LCM, then you load this LoRA and instead of using positive strength values as designed, you use a negative strength (like a slider LoRA). Think @GBRX and I both agree that -0.85 is a good starting point for many DMD2 models (at least for many of their models, ie MoP Mix). Then, instead of using the LCM scheduler with very low CFG and prompt strengths (ie, 1 CFG 8 steps), you can now use Euler A with normal CFG (usually ~4 to 6) with higher steps (50+) or common steps (~30) if using a double-step DPM++ scheduler. This also seems to allow for much longer positive and negative prompts for most models, as this allows things to more fully resolve. Some low/light-DMD2 models usually don't even need this process to be done at all, as they still work to varying degrees with Euler/DPM and higher CFG/steps, but using a lower negative LoRA strength (ie, -0.25) can sometimes still improve the image quality and/or allow for slightly longer prompts for those particular models.
Conversely, if you're using LCM, using this DMD2 LoRA with positive strengths, it usually adds a good bit of realism and improved image quality while being significantly faster (using very low CFG and steps), though I believe you're a lot more limited in prompt lengths and can't be quite as detailed or verbose with your descriptions. This is usage with most non-DMD models, as DMD models would already have have this process baked-in afaik.
I'm really not sure about the more technical aspects of how everything is doing what exactly, but this is just the behaviors I've observed while experimenting with various accelerated DMD models (one of the softwares I use doesn't support LCM, but I also prefer to use longer very descriptive prompts). Hope that info is helpful and about as specific as I can describe, but please let me know if there's anything in particular you wanted to know.
@AFD_0 Thank you very much for sharing. Unfortunately, I've tried and found that most regular models (those without integrating the dmd2 lora into the checkpoint) cannot generate images properly if the weight of the dmd2 lora is negative. Currently, I've discovered a good method using the skimmedcfg extension. I set the cfg to 3 and set it to 1 in skimmed, which improves prompt adherence without causing oversaturation. You might want to give it a try.
@twobladessword398 Yes, that is correct. You can only use the DMD2 LoRA at negative strength on models that already have DMD2 baked-in (whether fully or partially). All this is doing is removing/reducing the DMD2 acceleration from a model that has it, so it behaves more like a standard, non-DMD model. Using it negatively on a non-DMD2 model (or negatively too much on a DMD2 model) will usually cause very poor, unusable image quality.
I'll look into that extension, thanks!
Just adding my two cents:
I've found this to work really well with BigLove Photo 2/3 and Lustify V7.
BigLove - Lora Strength 1, 8 steps, LCM Exponential, CFG 1.
Lustify - Lora Strength 1, 5 Steps LCM DDIM, CFG 1, Hires Fix 1.5x, 0.2 denoise for 5 Steps, CFG 1 or 8 Steps LCM Exponential, CFG 1, Hires Fix 1.5x 0.35 denoise for 4 steps, CFG 1.
Thanks! And what about hires fix for biglove? What settings work there? MY hires fix is giving strange results.
@Delavestra I never did solve that issue. Recommendation from the model creator is to use Img2Img upscale instead. If you're in Forge/A1111 WebUI, you just do a "Resize By" upscale (No upscale model, just resize) to 1.5x, set the denoise to ~0.3 and run it for 8-steps with the same sampler/scheduler you used on the Txt2Img. I've found results from that method to work well a majority of the time.
Personally, I don't usually do HiRes Fix with BigLove. I find the base resolution to be fine in most cases; Just make sure you use the higher resolution suggested on the model page (1024x1496).
@ChillDesire **and make sure your hi-res fix is set to the same seed
I use it at 1.0 before with up to 12 steps. The main issue is there is a lot of subject doubling.
Maybe try reducing weights a little bit, to see if it helps reduce the "impact" of the model.
@6tZ helps a little to have it at .7 but depends on the checkpoint how much.
CFGscale should be 1.0
also it may duplicate or extend objects if using resolutions, higher than SDXL's native 1 megapixel. Look SDXL's supported resolutions up.
I cannot get a dmd model or lora to work! I get images wtih colorful splotches in certain areas and the imge is 75% correct but messed up and weird. What could I be doing wrong?
baja los pasos a 10 o a 8
Are you using the LCM sampler? Because what you're describing sounds like what happens to me when I use Karras or Euler or one of the usual ones... It has to be LCM.
i found it runs better on fooocus than swarmui for some reason .. anyway settings i use are - cfg 1/ Positive ADM Guidance Scaler 1 / 12 steps / LCM + align your steps and set lora weight to 0.7. . quite blown away at the quality it can produce , easily comparable to flux
This works with SD1.5 too. 8 to 10 Steps, 1.6 to 4.0 + CFG.
Based on a theory from a funny paper, I can conclude that you can use this LoRA at a weight above 1.0 if you use a CFG amplifier such as https://civitai.com/models/1523055/contrast-controller-ilnai
at negative values to counteract the CFG-prediction 'oversharpening' that occurs when it becomes too strong.
What do higher values do? Generally adhere to the prompt better, really.
Don't get too fancy with it though, the exact limit likely depends on the individual checkpoint, but it usually starts behaving mildly funky and unstable around 1.3 + (-0.3 contrast controller)
though this funkiness also mostly breaks through the lower variety of outputs inherent to Distribution Matching (the DM- in DMD), which is nice if you want some random yet mostly sensible outputs at 4-6 steps total
@Zames1992 I've also found a use-case for it;
you can use DMD2 at 1.2 and contrast at -0.2, then sample with LCM at 1.0 CFG for 8-12 steps
lets you apply (non-style) LoRAs to a checkpoint without output quality degrading like crazy, while keeping sampling rather snappy without cfg and with low step requirements
if you want negatives, you can use Normalized Attention Guidance at a minor speed cost
@VeerGeer I gave it a try—skimmed CFG lets you achieve decent results even at CFG 3, though it doesn’t support Forge Neo. As for the NAG approach you mentioned, could you share your specific parameter settings?
@twobladessword398 NAG Scale 2.5, Tau 2.5, Alpha 0.5 - 0.95 depending on intensity
@VeerGeer hi! Since you seem to know what you're talking about. Am I right to assume I can't use FaceDetailer with NAG? It's either one or the other? Could NOT get them to both apply without an error
@jb34r5e655 I'm struggling to word that the two are quite separate things, and any incompatibilities are certainly a result of spaghetti code soup.
as Normalized Attention Guidance is a mechanism for doing the 'conjure Picture from noise' process ("sampling") better;
whereas FaceDetailer is an attempt at automatically cropping, img2img'ing, resizing and stitching images to work around a common problem
it's likely a conflict in how NAG is being patched into the "sampling" system, and the automatic setup of re-sampling cropped images (facedetailer) calls this sampling system.
the entire "AI scene" is a mess, and leads to silly solutions like
"just use qwen-image-edit with an upscaler LoRA to fix up any images you like, loool"
@VeerGeer hey man thanks for much for the reply! I was loose with my terminology and I apologize for that, I'm new and learning. It's not the facedetailer, it's the ipadapter faceID loop that I have set up, sorry about that. Facedetailer lives further down the workflow and works great after the highres fix. My problem is an erroring out when the ipadapter faceid is enabled in the NAG workflow. FaceID works great in the non NAG workflow (like really well) but in NAG I get a Ksampler error. Is there anything I can do to remedy this do you think?
@VeerGeer This is working nicely, thanks
So this works to make SDXL a lot faster, especially for people who don't have the latest video cards. It works on any SDXL model, I tested 35 of them. To use it, install it as a LORA and set it to LORA strength 1. The CLIP strength does not exist or does not matter.
Setting: LCM (or Euler A also works)
Scheduler: Any of them work, giving different results. Some models tell you to use exponential, you don't have to and that's not always the best either.
EDIT: Steps:5
CFG:1
That's a good starting point, you can lower CFG to 0.8 and so on.
And yes, some models are a lot better than others, so it's worth looking.
Difference between the two versions of the LORA? I can't tell any.
For those wondering, if you take the exact same model DMD version and regular version, and compare the regular version with this LORA versus the DDM version, yes the results are identical (with the same seed) and it only depends on what strength the model author merged this lora with. Based on checking some models I found strength 1.0 was used. I'll update this if I find any not using 1.0.
Did you run it at 0.5 - 0.8 weight?
WOW. This is some kind of speed magic. I have a mid-slightly high end PC. But even with my usual workflow (or Forge), I have time to take drink, maybe check my phone while generating. I only found out about this Lora reading an article on a workflow for Illustrious. Thanks for making it accessible here (with credit to the original creator, of course).
This, this rocks. <3
no way this worked instead of all other lighting and lcm things, INCREDIBLE
Hey, may I ask whether its possible to make this work with HiRes Fix? From what I can see it changes the colors a bit too much on the second step
Doing some more tests on this.
The ideal steps appear to be 5 (I was wrongly using 8)
Tests seem to show the following:
4 steps - often malformed, but looks more naturalistic, unretouched a bit, malformation is not acceptable though.
5 steps - pulls itself together fixing a lot of the malformation, still looks more natural than 6.
6 steps - keeps pulling itself in even more losing the natural look.
This is with LCM Exponential, Lora strength at 1. CFG 1.
Will test different ones.
Tests at 1344 x 768 resolution using various top models.
Interesting. Didn't play around much with 5. I mostly found that I preferred 10-12.
@6tZ I guess that puts you into unknown territory since it's a 4 step Lora, that could totally work with illustration and such. You also recommended a very low strength of 0.7 which is definitely unbaked on realism models, so maybe another way to bake them is with more steps, I'll test that too, but it takes twice as long to do 10 steps as 5.
A few more tests:
Tested many samplers and the two that seem to reliably work are:
LCM and
Euler Ancestral
Results are very similar for the same seed
Each one of them works with the following schedulers:
Simple - yes, but worse quality
SGM_uniform - yes, but worse quality
Karras - GOOD
Exponential - GOOD
DDIM_uniform - very unique, I love this but it's an outlier
Beta - GOOD
Normal - GOOD
Linear_quadratic - NO
KL-optimal - another interesting one
The quality of the ones listed as good is next to impossible to distinguish but they give different results.
Now the interesting thing is that the Lora strength determines how "baked" the final product is.
When it bakes you can see the skin textures and things like gray hair appearing, just becoming much more realistic.
This tested with exponential scheduler for both Euler A and LCM at 5 steps. (Other schedulers might be different)
0.70 - unbaked under both LCM and Euler A, simplistic washed out image
0.85 - getting better
1.00 - fully baked under Euler A, not fully baked under LCM
1.15 - fully baked under LCM, probably overbake on Euler A
This holds for the vast majority of realism models, some of them are different and bake faster. You have to figure it out if you are using just one model.
As for steps, 5 seems to be a perfect number but again this can be model dependent. More steps can fix artifacts but the image becomes less "genuine" for realism models.
Details
Files
dmd2_sdxl_4step_lora.safetensors
Mirrors
Nekhbet.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
sdxl-dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora-fp32.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2-speed-lora-sdxl-pony-illustrious.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora (1).safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors
dmd2_sdxl_4step_lora.safetensors












