A text-2-image-2-video workflow using the all-in-one Wan2.2 model (v10 nsfw) found here https://huggingface.co/Phr00t/WAN2.2-14B-Rapid-AllInOne
(For ComfyUI)
You can also shoehorn in a quantized GGUF for the AIO model easily - the quality isn't much worse at Q4_K_M. Make sure to also load the VAE and CLIP if you do this, and route them as required.
YOU WILL NEED TO INSTALL CUSTOM NODES.
The workflow generates an image, refines it, then sends it through video generation and up-scaling.
The workflow is ready for NSFW content generation, and exceeds at anthro characters in various situations. You'll have to find and setup LoRA's to your liking.
On an RTX 5090 the workflow can put out a 5 second 16FPS video in ~60 seconds at 704x480 scaled to 1408x960. This duration includes image gen, image detailing (hands, face, genitalia), video gen, and simple up-scaling.
ComfyUI is launched with the --fast option - I don't see any significant visual degradation .
My rig has a 5090 and 64GB of DDR5 - so your mileage may vary.
Follow notes and instructions in the workflow to use it. Prompt inputs are to be made inside of a text input rather than the clip inputs.
Required model for detailer (or you can bypass the final detailer step):
Pussy BBox [yolov8] - v1.0 | Other Other | Civitai
Description
The very first version
FAQ
Comments (9)
Great workflow, I get similar output time with my 5090 and 96GB of ram ;) have you figured out how to get the last frame of the First-Last gen to not increase in contrast/brightness?
Unfortunately not, I suspect it’s completely tied to the AIO model, as using the newer “MEGA” versions doesn’t cause the same problem. I haven’t tried the latest few versions, but older MEGA versions (v1/2/3) didn’t result in good enough motion to justify switching… Version 7 is out now I think, maybe I’ll give it a try 🤷♂️
I've tested MEGA v7, and it can be finicky, but motion and image quality is good. v10 is more consistent, but MEGA v7 is definitely a good option now.
Pretty good workflow, but near the end of every video, the brightness and color suddenly jolt for 3-5 frames. Clearly visible on your demo clips as well. Any ideas on how to fix this ?
If you make a looping video (FLF) it will do this, but if you only use a starting frame (I2V) it wont. I've tried various models and some are better than others, but so far nothing has been as good with motion as the models I'm using with this workflow, so I haven't updated it. The models by darksidewalker are VERY good too, and will generate better clarity but not as good with motion (not sure about the latest version(s) though) - so they're worth trying to shoehorn into the workflow and testing out.
I've been trying to use this flow for a couple days, and almost have it working, but for some reason it seems like the image is not making it to the video section. The image generates well (detailers don't work very well, but I suspect that might be a SAM issue), but then when the video generates, it's as if it never received an input image at all. Neither I2V or FirstLast works. I'm using MEGA 12.1
With the MEGA models I think you have to adjust the CFG to one of 0 or 1 to switch between I2V and T2V.
I'd suggest my newer workflows where T2I and I2V are split apart - the model I use now is (generally) better.
Hell yeah, I'm using your new T2I and I2V flows and the WAN models you've suggested, and they're all working great! I've also went ahead and smooshed em both together into one big workflow that can do T2I, I2V, and T2I2V on toggles. This chunker has replaced pretty much all of my previous flows, except for inpainting. Much appreciated friend!
I'm clearly not very smart, I can't get it running.
Is there an easy way to see what models and things need to be downloaded.