v2
The workflow allows ultra-fast iterations over different seeds/prompts/images/settings before rendering the final video at full resolution, while removing the need to constantly swap between models (which is ideal for low-RAM setups).
How it works
There are 2 stages:
Generate a draft version of your video at low resolution, using the high-noise model only.
Upscale the video latent to render it at your target resolution, using the low-noise model only.
Process
Stage 1
Disable the groups "LN" and "Post".
Upload your image.
Write your prompt.
Set the low-resolution width (typically, 256, 288 or 320px, depending on the image aspect ratio).
Click "Run".
Stage 2
Drop the video draft created at Stage 1 into the ComfyUI canvas.
Disable the group "HN"; enable the groups "LN" and "Post".
In the "Load Draft" node, select the same video again.
Set the high-resolution width (typically, x2 or x2.5 the width set at stage 1).
Click "Run".
⚠️Important constraint: make sure the video width and height are multiples of 16, at both stages of the process, or the final output will be blurred.
The workflow was intentionnally made as simple as possible to maximize compatibility. Up to you to adapt it further to your needs.
For a simpler but less efficient version, see v1 below.
v1
The workflow significantly mitigates the "speed VS quality" dilemma, allowing users with low-end hardware to generate videos in HD resolution nearly twice as fast!
How it works
The principle is dumb simple: we run the high-noise model at very low resolution, then upscale the latent before injecting it into the low-noise sampler.
Since the original image is also reinjected, with a new Wan wrapper node at the low-noise sampling step, visual details are preserved.
Limitations
The motion does lose a little in subtlety, but the speed gain is totally worth it in most of cases.
Not tested on T2V, probably won't work.Works on T2V too. Simply replace both I2V nodes by a EmptyHunyuanLatentVideo node. Thanks to @axicec for sorting this out.
Getting Started
Replace the models by yours, or follow the links below to download them.
Install the required custom nodes listed below if they are missing from your installation.
Load an image and write your prompt.
Click Run.
Speed benchmark
settings: 65 frames, using Q5_K_M I2V models, 4 steps on high-noise with lightx2v 1030, 4 steps on low-noise with lightx2v 1022 and Fun HPS2.1 loras, euler/beta sampler/scheduler on both samplers.
hardware: RTX 3060 with 12GB VRAM and 32GB RAM.
768*1152px (2:3)
768*1152, no upscale: 20′46″
256*384 x2 then x1.5: 11′16″ (-46%)
256*384 x1.5 then x2: 10′48″ (-48%)
720*1280px (9:16)
720*1280, no upscale : 23′19″
288*512 x2.5 : 15′57″ (-32%)
288*512 x2 then x1.25: 11′57″ (-49%) <- this is the showcased video
Target resolution VS Hardware Requirements
HD : >= 12GB VRAM v
FHD: >= 16GB VRAM ? (not tested, feedbacks appreciated)
Initial sampling resolutions and step settings recommendations are included in the workflow.
Custom Nodes
Required
Optional
ComfyUI-GIMM-VFI (for interpolation)
rgthree (for fast group bypassing) (v2)
Models Used
Wan 2.2 14B I2V, Quantized:
Lightx2v LoRas :
https://civarchive.com/models/1585622?modelVersionId=2361379
https://civarchive.com/models/1585622?modelVersionId=2337903
Fun LoRa:
Edit (v1): the last sampler's scheduler is set on linear_quadratic by default but it should be beta.
Description
FAQ
Comments (21)
Thanks for it, but sadly nsfw_wan_umt5-xxl_bf16 wont work in this workflow
I don't know that resource. Can you show me what it is?
@qdr1en nsfw_wan_umt5-xxl_bf16 is a uncensored text encoder (https://huggingface.co/NSFW-API/NSFW-Wan-UMT5-XXL/tree/main)
@SysDeep Oh ok that one, I have it too then. Sorry to ask, but why it wouldn't work?
(I indeed see warnings in the console when using it.)
works fine.
Sry my fault, i forget i use custom CLIP-Loader with custom version of transformers ^^
Thanks for sharing the workflow.
- How do you take such a high resolution screenshot of your workflow?
- I see how you're upscaling the latent but I don't see where you're using the WAN FFLF latent. Only the upscaled latent seems to be used.
- Right click anywhere on the canvas > Workflow image > Export > png.
- The WAN FFLF latent is bypassed on the low-noise samplers, the upscaled latent is injected instead.
I'd imagine this method would be much better used for t2v. The problem with i2v is the inevitable loss of consistency with the original image- for consistent renders one wants as high a render rez as possible. Of course if the original image is just an 'inspiration' for the render, this won't matter.
Given wan2.2 seems vulnerable to all kinds of speed-up and multi-stage methods, it is a shame there isn't one online resource that acts as a competition for best workflow methods, so through experimentation we can identify the best and most useful ideas. With no better GPUs on the horizon (for quite a long time to come), and our limited VRAM, we need to squeeze every last drop of performance from what we do have!
I have almost never used T2V but I doubt it would work with it, so I did not even bothered trying.
Qwen Edit or character loras help at maintaining consistency, but it's a tedious work...
I totally agree on your last line, and I think we can consider this method as a hack :-)
why would we need online resource... SaaS is gay. buy a gpu.
Thanks for reminding me of this, my original method was lacking this one kinda helped me kickstart my experimentation again
works fine with t2v, omit the i2v nodes and provide a empty hunyuan latent, same goes for T2I, provide your own method, or use 1 frame for empty hunyuan latent.
@axicec That's great news, thanks for reporting it. I will update the page soon then.
how can i add loras to this workflow? seems like the speed loras are baked in and woirrried it will mess up speed if i add those in, or errors
You can replace the speed loras if you use other versions, they are not baked in (I just changed the node titles in the workflow).
To add more loras, you can either add a "Load Lora", "LoraLoaderModelOnly", or (if you have rgthree installed) "Power Lora" node, between the "ModelSamplingSD3" and "Speed LoRa" nodes, for each model (high/low).
Let me know if it makes sense.
Alternatively, save the showcased video on your hard drive and drop it into comfyUI, it will show you the same workflow, but with the "Power Lora" node already included.
(I have accidentally set the last scheduler on "linear_quadratic" in it, better to set it as "beta" instead).
Is it possible to apply this workflow and generate a video at low resolution until you generate a video with the movement you like, and then convert it to HD once a good video is generated?
By fixing the seed value, for example.
One generation takes 12 minutes. With that amount of time it would be difficult to keep trying until a good video is generated.
It turns out I have that in stock (generate a video only with high-noise model and store it in a temporary folder - for testing different prompts/images/seeds/etc. - then render that video at high resolution with the low-noise model only). I will publish a v2 based on your feedback then. Stay tuned!
@qdr1en That's great news! I'm looking forward to the release of v2!
@qdr1en Nice! I have been working with workflow saving latents to disk and just I can see the progress through my phone and divided Prompting, HN,LN and video compile, but I didnt know and I didnt even think about a HN first result observation before proceeding to the next LN sampling !! I vant wait to test your workflow!!!
