Wan2.2 Animate + SAM3 - 1 minute long videos - v1.0

NSFW

Apr-10-25 hotfix:

Made some changes to the workflow to improve quality, see the "About this version" section for details.

This workflow implements SAM3 for character masking.
It uses looping to process long videos, and while it can theoretically generate infinitely long videos, in practice it is limited by VRAM and processing time.

It supports both Wan Animate modes: character animation and character swap.

It has been tested on 1-minute, 480p videos using an RTX 4090 with 64GB of RAM, though it supports 360p, 720p and 1080p resolutions too. My tests suggest you can generate up to 20 seconds at 1080p with an RTX 4090 using a video batch size of 80 frames (if you have the patience and the time for it).

Features:

Character animation:

Full reference image animation using a video reference.

Character swap:

Insert a character in-place in a video using high quality segmentation with SAM3.

Face swap:

Possible using "face" as "CHARACTER ISOLATION DESCRIPTION", but don't expect miracles. You can achieve better results by face swamping the face separately using some edit model like Flux Klein or Qwen edit and then use that new edited image as your Character Reference Image using the character animation feature.

Detailed instructions are contained within the workflow itself:
- Yellow nodes are input and configuration nodes you can change to suit your needs.
- Red nodes are instructions and helpful notes

Description

First version release.

FAQ

Comments (7)

wdptt13444Apr 2, 2026

CivitAI

how to make sam3 to work?, i already downloaded sam3.pt in the sam3 folder but it's no go

LatentHeart

Author

Apr 2, 2026· 2 reactions

Install the comfyui_sam3 custom node, then install Triton and add the missing libraries to your python embed directory (if you haven't yet), your conda environment or your system wide python installation, depending of how you installed ComfyUI. In the "CHARACTER ISOLATION MASK" group there's a note pointing to the detailed guide on how to install Triton for windows and the libraries you need to use the "video_model" feature. Always make a backup of your python embed or your environment before touching it, that way if things go south you can roll back to your previous version. BTW, you DON'T need character isolation if you want to to animate from a guiding video with a single character in it, so you can forget about the character isolation group if you don't need it, it is only required for character swapping or if you want to isolate a single character within a guiding video with several characters in it.

wdptt13444Apr 3, 2026· 1 reaction

@LatentHeart yes i want to do character swapping, it's working now and it's come out very good i like it, very good workflow, thank you.

LatentHeart

Author

Apr 3, 2026

@wdptt13444 Glad to hear it ;)

drak0nApr 3, 2026

CivitAI

Hi,

I tested 1080p with a 5-second video. Surprisingly, my RAM usage stayed steady at 70%. The problem was with the VRAM, which stayed at 99% the whole time. I finally managed to render it. After that, I tested a 6-second video. That’s where it crashed. From the start, the VRAM stayed at 99% and wouldn’t go any further. If we can keep the VRAM at least at 90%, that would be a major achievement. Other workflows use "WanVideo Block Swap," but I don’t know if it can be implemented in your workflow. You know better if there’s any possibility of implementing it. In any case, rendering a 5-second video at 1080p is a major achievement.

LatentHeart

Author

Apr 3, 2026· 2 reactions

Did you adjust your video batch size (in the “Video settings” group)? At 1080p, the 4090 can handle 5-second video chunks, so a video batch size of 80 frames does the trick. This way you can render longer videos, but it takes forever to render.
On my machine, a batch size of 80 frames works, but you can try lowering it to 48 frames (3-second chunks) and switch the VAE decoder nodes to the tiled variant if yours crashes.

However, what I prefer to do is the following:
For short videos (20 seconds or less), I use 720p resolution.
For longer videos, I use 480p.
After that, depending on whether the output quality requires it, I run a second pass (720p high-res fix) with the Wan2.2 low-noise model using a character LoRA to maintain consistency. Most of the time, though, the Wan Animate output is good enough and doesn’t require strong refinement at all.
Then I run a 1440p upscaling and interpolation workflow. If light refinement is needed, I use SeedVR2 as the upscaler; it adds extra detail and sharpness. If the video is already good, I use the NVIDIA Video Super Resolution upscaler node. For interpolation, I use RIFE VFI.
This way, I get fast generations at the beginning, allowing me to run several iterations until I get a good result, and then spend more time refining and upscaling that result.
Same old trick as always: generate quickly at lower resolutions, then refine and upscale once you have something worth keeping.
I honestly haven’t pushed the limits of 1080p yet (the longest I’ve gone is 10 seconds) so I can’t say for sure how long videos can go at that resolution.

LatentHeart

Author

Apr 3, 2026· 1 reaction

So I decided to do a stress test, I could generate a 1080p-20 second character swap video run before it crashed, so I guess that's the limit with everything enabled, not 100% sure though.

Workflows

Wan Video 2.2 I2V-A14B

by LatentHeart

Download (Beta) View on CivitAI