Overview
Using Wan Animate 2.2, transfer motions from Reference Video to animate the character in Reference Image, or replace the character in Reference Video with the character in Reference Image. Tested with videos up to ~20 seconds long but theoretically should support unlimited length.
Still a WIP as there are some rough edges (certain reference videos and images work better than others, character identity drifts the longer a video goes), but releasing it as I don't see other similar workflows available on Civitai yet.
Key Features
Using
WanVideo Block Swap&WanVideo Animate Embeds, this workflow splits long videos into small "windows" of 81 frames (~5 seconds) so that theoretically unlimited video length can be supportedUsing RIFE VFI, this workflow interpolates the generated frames so that buttery smooth video of 60FPS or more (configurable in workflow) can be generated
Custom ComfyUI Nodes
Model Download Links
Important Notes
Generating at 480p (480 x 832 pixels), system RAM usage peaks at around 47.8GB and VRAM usage peaks at around 15GB so you would need a system with ≥16GB VRAM, ≥48GB RAM to run this workflow as-is.
You might be able to lower the system requirements by tweaking the various settings.
Description
Updated to use
vitpose-h(Huge) +yolov10l(Large) instead ofvitpose-l(Large) +yolov10m(Medium) for better pose detection.Updated ref_image and ref_video resize methods for better face & pose detection
FAQ
Comments (26)
Thank you for this much-needed update regarding face detection. The third version had some issues detecting natural eye movement. Now, in the fourth version, there is a noticeable improvement. Congratulations on this simple, high-quality workflow.
Nice, works very well but unfortunately facial likeness isn't really preserved. Anything to be done about that?
It's not ideal, but If you have a character lora for 2.1 that fixes it.
@aboimard has to be 2.1? I've been experimenting with character 2.2 loras and they don't seem to make any difference in i2v output
@frosty639 I'm not sure, I just tried it with a 2.1 lora and it was really good, no character drift. Maybe just the low noise from a 2.2 would work but I'm not experienced enough to know for sure.
The workflow runs but it seems to be skipping the whole Character Mask and Background Processing nodes. No image populates the points editor. I also dont see the generated video in the UI but I can see it in the output folder. Even when I manually upload the image it doesn't seem to respect my points either. Any ideas?
it s jsut cause the node arent connected you must connect them ;)
After several tests, certain changes to the person's face can indeed be observed. I compared version 3 with version 4 and noticed some changes in the "Resize Image v2" node. In version 3, you have the "pad_edge_pixel" option and the "center" crop position. In version 4, however, you opted for "crop" instead of "pad_edge_pixel." I changed from "crop" to "pad_edge_pixel," and it recovered somewhat. Honestly, I don't really understand what building a workflow involves and how it works. I'm trying to help you with some opinions. I hope these views will be useful to you.What do you think?
You're saying v3 had better facial likeness? Because yeah, the latest one seems not great in that area, which is probably the most important area.
@frosty639 Version 3 is better in the sense that it does not change the facial features of the person in the video, but there are some minor issues with motion detection. Version 4 has resolved these issues, but as you may have noticed, the person's face changes. If you want consistency in the person's face, this may be a problem. But surely, the author of this workflow will solve them.
@drak0n Thanks for the feedback! That's weird though. I've changed the crop mode from "pad_edge_pixel" to "crop" because it introduced visual artifacts if the dimensions of the ref_image and ref_video don't match. Could you send me the ref_image and ref_video you're using so I can test it more thoroughly?
Hello. I'm producing a 14-second video, but the resulting video is only 7 seconds long. What should I edit?
Nice work dude, even one of the best wokrflow! Thank's bro! Do you have same workflow but in fp8 version with sageatten ? I have rtx 4080 with 64ram and i think in fp8 it will work without problem on my pc. But in any case BIG THX bro! U'r the Best!
This is an amazing workflow, the only issue is how to stop the camera form making movements, it keeps zooming and i dont know how to stop it
I keep showing the warning of "AssertionError 399 WanVideoSampler". Does anyone know why? I'm using 4090gpu and set the size to 832*480
Does anyone else get a static mess when attempting this? Whenever I try using this workflow all I get is pure static?
sounds like your phython enviroment is the problem
@martinr573 do you know what the fix would be for this?
works on 16gb vram and 32gb ram. But it can see how to optimise it more, to use benefits of 5series rtx blackwell. since video processing times are insane. THose wanvideo loaders are no good.
idk what i fukced up with the pose transfer but i ended up with the same video i started with. switching the two spaghettis got replace character to work tho.
which two spaghettis did you have to switch kind sir? because i have the same issue
I only got 8 seconds, where did 30 come from?
Tested with 5060ti (16GB VRAM) and 32GB RAM. Was able to generate 18 sec video (This was the longer video i tested, probably longer video is possible). By default the workflow does not apply masks to replace the character partially, but i think it is doable.
Part2: The workflow has a comment explainning how to use masks. To activate it we need to connect 2 node outputs.
I did some changes in case the author wants to include in the next version:
1 - I keep the 2 nodes always connected and put the nodes in a group. Then i just enable/disable (bypass) the group to activate the use of masks.
2 - I added a (Purge VRAM 2) [comfyui_layerstyle] node after "draw the masks on image" node
(to clear VRAM after generate de video masking)
3 - I replaced de mask points editor with SAM3 to select the mask through text (Ex ... face and hair , body , cloathing) instead of manually select the green/red points [this one is just a personal preference]. I suppose SAM3 consumes a little more VRAM, then item 2 is important to avoid "Out of Memory" errors
@nerdmonstro401 How did you go about adding sam3? Because whichever way I tried, though I could see the mask was being applied to the body correctly, the generations started becoming a trippy mess of flashing colours, though I could make out what it was supposed to be somewhat easily.
on my personal testing so far, the best way to attempt to keep the identity is to do the following:
grow mask: expand=8
blockify mask: block_size = 20-28
also hookup the negative coords to sam2segmentation
In wanvideo animate embed: facestrength = 4.0 posestrength =2.5
wan video sampler i had cfg to 3.0 and ofcourse more steps does help. ive also found that smaller batches hold the identity a bit longer over larger batches ( frame_window_size )
i generally check whats the best amountbased on total frames and how many passes would need be run. im personally sitting at 10 steps at the moment as i try to improve the quality.
ive so far had success up to 15 seconds and am no playing around with 30 seconds.
it helps to have ai clean up your reference image if its not already a clear shot.
oh i should point out nearly all my generation is
to replace character instead of pose transfer but mostly seems to apply either or
this workflow is the best ive tried and its just missing a method to lock in the identity without having to go train lora of the specific face/body you want to lock in.
ive had poor luck in attempting to input ipadapter.
lastly 720p video gives the best results as i think it doesnt have enough details at 480p
if you insert a resize image v2 between the rife vfi and the video combine, set upscale method to nvidia_rtx_rsv and device to gpu. its a decent but quick way to have the output upscaled.