Overview
Using Wan Animate 2.2, transfer motions from Reference Video to animate the character in Reference Image, or replace the character in Reference Video with the character in Reference Image. Tested with videos up to ~20 seconds long but theoretically should support unlimited length.
Still a WIP as there are some rough edges (certain reference videos and images work better than others, character identity drifts the longer a video goes), but releasing it as I don't see other similar workflows available on Civitai yet.
Key Features
Using
WanVideo Block Swap&WanVideo Animate Embeds, this workflow splits long videos into small "windows" of 81 frames (~5 seconds) so that theoretically unlimited video length can be supportedUsing RIFE VFI, this workflow interpolates the generated frames so that buttery smooth video of 60FPS or more (configurable in workflow) can be generated
Custom ComfyUI Nodes
Model Download Links
Important Notes
Generating at 480p (480 x 832 pixels), system RAM usage peaks at around 47.8GB and VRAM usage peaks at around 15GB so you would need a system with ≥16GB VRAM, ≥48GB RAM to run this workflow as-is.
You might be able to lower the system requirements by tweaking the various settings.
Description
Initial version
FAQ
Comments (28)
What can I say except congratulations. I can see the work that went into this workflow. I have an RTX 4090 and 64 RAM. For a 17-second video, it took me about 31 minutes. The quality is indeed much higher than other workflows I've tested. But honestly, the generation time kills me. And with 64 RAM, I reached 99% utilization. But the quality is evident. Once again, congratulations.
If the latents are being calculated across the entire 17 secs of frames, then the time increase is far from linear - in other words you cannot take the render time for 5 secs and expect to multiply by 3ish. This is the cost of frame consistency.
The conventional method is to render in batches of 5 seconds, and then use overlap blending, which will use much less memory and render faster. The downside will be in image consistency. OTOH the examples here have a suspicious solid colour background. The problem with the 5 second batch method is background consistency. Each batch will imagine a different revealed background when the actor moves. If this workflow still has the background issue, I'm not sure what advantage it really offers.
@blobby99 I changed the model in Q8.gguf and the resolution to 576x1024, and I can say that the quality is remarkable. The background is unchanged, as it is in the model image. For an 8-second video, it takes me almost 12 minutes. It's okay. The first time I tried it with an animated model that wasn't gguf. Now it's okay. The resources are no longer so large. I think it's the best workflow for animation.
In one of the videos, I got this error: "!!! Exception during processing !!! OpenCV(4.11.0) D:\a\opencv-python\opencv-python\opencv\modules\imgproc\src\resize.cpp:4208: error: (-215:Assertion failed) !ssize.empty() in function 'cv::resize'
Traceback (most recent call last):
File "\ComfyUI\execution.py", line 496, in execute
output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, hidden_inputs=hidden_inputs) ". The rest went smoothly, but there was one that was a bit nutty. Strange to me that there wasn't much difference between the image and the video clip.
@drak0n Thanks for your kind words! Your processing time feels weird though; I have a RTX 4080 with 48GB system RAM, and this workflow takes around 1 minute for each second of video (so a 17-second video takes me around 17 minutes).
Are you perhaps using Torch compiled with CUDA 13.0 and getting ONNX errors in the console? If so, try downgrading to Torch /w CUDA 12.8 and ONNX should give you a nice speed boost for pose & expression detection.
Also try installing Sage Attention if you haven't already; it gives a nice speed boost too.
Finally, you could try v0.3 of my workflow I've just uploaded; I've removed DWPose Estimator (which seemed to use lots of RAM without any appreciable improvements) and replaced with WanAnimate Preprocessor, now the workflow peaks at 46.0GB RAM compared to 47.8GB before.
The cv::resize error looks really weird and I've never seen it before. But my experience with ComfyUI is that it feels really finicky at times and exhibits some signs of memory leak, so I usually shut it down and restart it every few generations.
@wildkrauss Thank you for your reply. That error only occurs with certain videos where it fails to detect the person's face or entire body. I have sage attention. Indeed, 1 minute for each second of video. So, if I have a video that is 15 seconds long, for example, the duration will be at least 15 minutes, after which it takes another 2 or 3 minutes at RIFE. When I generated that video in 31 minutes, I wasn't using GGUF. From the moment I used Q8, it returned to the generation duration. Something strange happened: I tried to make a 7-second video with a resolution of 720x1280 using Q4. The generation time was double, but in the end, I was surprised to see that it didn't generate the entire video. One second was missing. When I went back to 576x1024 with Q8, everything went back to normal.
@wildkrauss I tried the third version, and shockingly, the RAM reached 99%, which did not happen in the second version. I added some nodes to the workflow, such as 🎈RAM-Cleanup, which helps somewhat. Without these nodes, I definitely couldn't run the workflow, especially at a resolution of 576x1024. Now my RAM is constantly at 90%.
@drak0n Hmm that's really weird. For me this version reduced the RAM usage enough so I don't get OOM every so often. Let me see if I can figure out what's going wrong.
@drak0n At which node do you see the huge RAM spike? Is it at RetinaFace, Pose & Face Detection or the actual video generation?
@wildkrauss Ideally, at the end of generation, the RAM should return to normal values. In the second version, with those RAM nodes, I managed to get the RAM back to normal values at the end. In the third version, at the end of generation, I was left with 40% RAM. As a result, in the next generation (instead of starting at 10%, for example, it started at 40%), the RAM reached 99%. Now I have to figure out how to make the RAM return to normal values at the end of generation in the third version.
@wildkrauss I don't want to stress you out. I noticed a difference between versions 2 and 3. In version 2, the "use_non_blocking" option is not enabled in the "WanVideo Block Swap" node. In version 3, you have enabled this option. I ran a test and disabled the "use_non_blocking" option, and everything returned to normal. In version 3, with this option enabled, my VRAM was at 98%-99%.
@drak0n Awesome catch, thanks for finding it! I've turned that option on while experimenting whether it improved the output results, figured it didn't make much difference but forgot to turn in back off haha
Congratulate,Btw, what`s your max temp in whole time and your room temp,thanks
@drak0n 12min for 8-sec? You can get faster than that with a more standard workflow. I was getting 10sec gens much faster than that on a 4070Super with 12gb with a Fusion-X workflow.
@BRai4now Doesn't it depend on the FPS you chose? If he gets those results with Q8 at 32FPS, it's a qualitative upgrade anyway
@TheGlowingGuardian wat? You can't generate at 32fps, Wan can only generate at 16fps and then interpolate. That has nothing at all to do with quant or quality. And the time it takes for the interpolation to run is negligible. It's seconds.
@BRai4now Thank you, I thought it was done differently, that makes sense.
@BRai4now There are certainly countless workflows that function. I have tested other "animate" workflows, but the final result was not of high quality, and the background was not changed. This workflow offers something interesting, and that is that you can theoretically generate a video without any restrictions on duration. The PC you own certainly matters a lot, but theoretically it is possible. I saw a few posts on Reddit recommending "wan 2.1 fusionx i2v" and "low noise Lora." Honestly, I haven't tested these Lora models. I would really love to see a reduction in generation time. For me, this workflow has a lot of potential, and if we could reduce the generation time even further, it would be incredible.
A quick update: I ran a small test with "Wan2.1_I2V_14B_FusionX_LoRA" and, surprisingly, I noticed an improvement in quality. It did increase the generation time by 50 seconds, but the quality is clearly better. I should mention that I am running at a resolution of 576x1024.
@drak0n I wasn't really speaking of the Lora, though that does have a quality improvement, because it's a mix of a bunch of other improvement loras. I was speaking of the workflows, which have been posted here by the same author.
But actually, I appear to have lost the plot that this is a wan-animate thing going on, so maybe long generation times for 8sec is just that much worse than normal i2v or t2v generation and so my apologies for that if it's the case.
I have everything installed. however the nodes "OnnxDetectionModelLoader" and "PoseAndFaceDetection" are showing up as missing. Any help?
Nevermind, I figured it out. Had to search for "wan-animatepreporcess" in comfyui manager and install that.
@Proxy00 Thanks for letting me know! I've used ComfyUI Manager's Custom Nodes in Workflow feature to identify and list the custom nodes I've used (I've been installing so many I can't keep track haha), but for some reason WanAnimate-Preprocess didn't show up. Great to know you've figured it out regardless!
@wildkrauss I cant find BlockifyMask, DrawMaskOnImage, WanVideoLoraSelectMulti, and WanVideoAnimateEmbeds. I have everything else though
@Yourmomd BlockifyMask and DrawMaskOnImage are part of KJNodes (https://github.com/kijai/ComfyUI-KJNodes), while the other two are in WanVideo Wrapper (https://github.com/kijai/ComfyUI-WanVideoWrapper)
@wildkrauss I get this error when running the WanVideo sampler module, I have an RTX 5060ti 16gb
SM89 kernel is not available. Make sure you GPUs with compute capability 8.9.
@Yourmomd Hmm that error means the SageAttention library you've installed is incompatible with your GPU. You can refer to the GitHub issue: https://github.com/kijai/ComfyUI-WanVideoWrapper/issues/449
If all else fails, you can disable SageAttention at the WanVideo Model Loader node by changing attention_mode from sageattn to sdpa
@wildkrauss Thanks, and is there another way to replace your Points Editor node? It doesn't work well for me
I was looking at this post but did not know how to implement it into your workflow:
Wan Animate KJ node Points Editor : r/StableDiffusion