Features
Uses quantized GGUF models instead of FP16 checkpoints for faster speed and lower VRAM usage
Uses TripleKSampler for better prompt adherence and improved motion
Uses LightX2V Lightning LoRAs for blazing fast speed (5 minutes for 81-frame video on RTX 4080)
Uses RIFE VFI for quadruple FPS boost (to buttery smooth 60 FPS)
Custom ComfyUI Nodes
Download Models
Unet
The following files should be saved into /models/unet
Text Encoders
The following files should be saved into /models/text_encoders
VAE
The following files should be saved into /models/vae
LoRAs (for speed)
The following files should be saved into /models/loras
Description
First version
FAQ
Comments (6)
Low VRAM is only a real issue with LLM models. With video models, you should ensure you have enough RAM (much cheaper than VRAM on an expensive GPU) to hold the model. Then, once per iteration, the model is streamed from RAM to your GPU as needed- it is a LINEAR data structure, and does not suffer from random access issues.
Learn how to launch ComfyUI so models do not steal all your VRAM. Upgrade RAM to at least 64GB (128 is better).
I get what you mean, but the problem with most other workflows I've seen (using fp16 or fp8 models) is that it won't even load for the first iteration unless you have a lot of VRAM. I have 16GB VRAM which was sufficient for SDXL or even FLUX, but Comfy crashed with out of memory error when I tried to run any of the popular Wan 2.2 T2V or I2V workflows.
Maybe there's a way in the ComfyUI settings to offload the video model to RAM instead of VRAM like you said, but I couldn't find any like that and even if I did I suspect the speed will take a huge hit.
Which is why I had to resort to GGUF, and I'm sure a low-memory workflow like this will be appreciated by beginners who don't have access to powerful machines like H100.
@wildkrauss 16gb is perfectly fine for Wan2.2 as it was with Wan2.1 and with FP8 with the Fusion-X Ingredients workflow and KJ wrapper based workflows. GGUF is just the well established, better option. I was doing gens just fine on my 4080 Super with the KJ fp8 scaled models and others similarly with the wrapper. I could do longer gens when I'd stick to Q5K. But on my 5090 box, with twice as much, I'm still using GGUF because Q8 > fp8 in terms of quality and there's supposedly errors with fp8 on the newer cards.
12Gb is perfectly fine for SDXL, Pony, Illustrious and Flux.
I've been generating 13 sec clips in about 45 min on a 3070 with 8gb of VRAM and 32gb Sys Ram.
I know that's a long time to generate video but I'm amazed it works at all.
@ruckusmelees79319 Is that with the full 14B model or GGUF?
Yep...i'm with 20GB of VRAM and only 32GB of Ram. On windows I could probably generate with this Workflow since the memory management wouldn't crash my pc, but it will take a long time, on Linux my whole system crashes cause of lack of RAM, even though I set an entire NVME to be used a SWAP. Now it's too late to get RAM, its 3-4x the price it was 4 months ago.