Wan 2.2 Video + Sound workflow optimized for RTX 3060 12 GB VRAM GPU

Wan 2.2 Video + Sound workflow optimized for RTX 3060 12 GB VRAM GPU - v2.0

NSFW

[Edit:

Version v5.0 works with latest comfyui (v0.15.0).

If you have any problems, please refer to the FAQ at the bottom of the page or have a look in the comments.

Many thanks to everyone who tested this workflow. Thank you very much for the many inquiries and, of course, for all the knowledge and experience you have contributed. here👍🙂

Special thanks to:

@SeoulSeeker for the "Dead Simple MMAudio" workflow wich are the basis of the audio part here,

@taek75799 for the really well working enhanced models

@Bakazaya pointing to the color issue in version v3.0 and running lots of tests,

@bluntfeather sharing latest experiances with installing Comfyui-Easy-Install,

@nitrovtx for remain persistent in matters of quality and running a lot of tests,

@Icey64 for providing the link to "Comfyui-Easy Install",

@boinobin730 for asking for a First to Last Frame option, running pre tests and responding fast as hell 🙂 and

@SnowShoes311 thank you so much again for all your buzzing 😋]

Features:

Optimized Wan 2.2 workflow, runs perfect on RTX 3060 12 GB VRAM GPU and 32 GB RAM,
"Text to Video", "Image to Video" and "First/Last Frame 2 Video" generation in one workflow, all with easy audio generation,
easy installation/model downloading, all necessary sources are specified,
easy to use workflow, clearly structured, all necessary steps are explained,
easy switches for mode selection,
easy prompt selection for fast prompt creation/testing,
easy switching between "standard" and "enhanced" models,
very fast and smoth high quality outputs up to aprox. 1440 x 960 with 60fps,
2x fast upscaler,
4x fast framerate multiplier,
MMAudio Sampler (generates sound accordingly to the video action),
Triton and Sage Attention option,
A 5 Second long high quality video generation takes about 10 - 15 minutes (see below).

Tested generation times:

As a rough guide value for RTX 3060 GPU: generating a 5 second long high quality 1440 x 960 60 fps video with 6 steps it will take:

t2v: around 10 - 12 minutes,
i2v: around 15 minutes.

Comfyui-Easy-Install with Triton + SageAttention:

This workflow should work with any latest comfyui version >v0.6.0 (Desktop, Embedded, Windows/Linux).

However, comfyui is developing rapidly, and it often happens that some of the custom nodes used are not updated quickly enough or not updated at all. Manual workarounds are sometimes necessary. Furthermore, care must be taken to ensure that there are no conflicts with other nodes.

If you're having difficulties with your existing comfyui system or if you want to run video generation on a separate (parallel) comfyui system, like I do, I would recommend you the following installer: https://github.com/Tavris1/ComfyUI-Easy-Install.

Complete installation of comfyui including manager and some pre configured custom nodes is just one click - really 🙂
Installation of Triton + SageAttention is just a second click - really 🙂 And since it's so easy now, I would definitely recommend it to you for video generation.
Cause it is an embedded version, you can install it parallel to your existing comfyui version without the risk to ruin your working system.
After installation just configure the "extra_model_paths.yaml" file to use your existing models.
After a fresh installation of Comfyui-Easy-Install you might have some issues too, but there are known workarounds - please see the FAQ below.

For testing/understanding/experimenting/changing the workflow:

Click "Toggle Link Visibility" to see the links.
click the Subgraph symbols to open the Subgraphs.
for quick testing you may lower the settings for: steps, clip lenght and video resolution,
be really carefull with modifying Groups or Subgroups (even Titel or Color) cause they are essential for switching,
feel free to try and test other models. Just give me a hint if you find models which deliver better results and fitting the 12 GB VRAM limit.

And as usual: Have Fun 🙂🙂

Short Conclusion:

This workflow is based on elements of a variety of allready published workflows. My "job" was only to put things together, optimize it for a small machine and create a most simple and hopfully user or even "beginner" friendly workflow.

I`m not an "expert" - just a user who wants to get it running on "available" hardware.

There are many things I don't really understand. If you find mistakes or better solutions please give me a hint.

And I really hope that even "beginners" have a chance to go the first steps...

Frequently Asked Questions (FAQ):

For quick and better overview I will try to merge all known issues here - step by step (please be patiant). If your issue is not listed here, please have a look in the comments first. Most issues have been allready discussed.

Comfyui Nodes 2.0:

Turn off Nodes 2.0 in comfyui (use comfyui menue). Actually not all custom nodes are supported.

Comfyui crashes after generation while vae decode, upscaling or frame rate multiplying (Rife VFI) without any error report:

This is a RAM problem (not VRAM). Increase your swap file (min. 64 to 128 GB) or set it to automatic management on a fast drive with at least 100 GB free space.

JW Nodes (JWFloatToInteger, JWIntergerDiv, JWImageResizeByLongerSide), soundfile missing:

For the workaround look here and here:

python -m pip install soundfile

Fresh Comfyui-Easy_Install Installation (missing soundfile and Pytorch v2.9.0 issue with SageAttention on Windows:

For full conversation look here.

Open cmd in python_embedded folder:

python -m pip install soundfile

python -m pip uninstall -y torch torchvision torchaudio

python -m pip install torch==2.8.0 torchvision==0.23.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu126

Slider Nodes - how can I modify the "default" values:

Right click the slider node, choose Properties and set the values you like 🙂🙃

Description

First Frame to Last Frame Video Generation option added,

directly clickable download links for all models and file structure overview added,

completely reorganised design for easy using of all options.

FAQ

Comments (37)

boinobin730Sep 25, 2025· 2 reactions

CivitAI

The version 2.0 workflow is extremely versatile. Having a start and end frame image allows for more fluid motions. Thank you for enhancing the 1.3 workflow.

jonk999Sep 26, 2025· 1 reaction

CivitAI

Have been using v1.1 for a while.
I tried v2.0 and I got an error on the additional Lora nodes. I needed to connect the Load Clip to clip on high and low additional Loras.
Also just wondering why on v2.2 KSampler Low has seed of 0 and fixed whereas in v1.1 it was randomised.
And finally, have you found Beta scheduler to be better than Simple that was used in v1.1?