WARNING: Hey, I've lately moved towards LTX2.3 but I will publish a fix to workflow issues mentioned in comments once I have enough free time.
I've officially switched to LTX due to slowmo issues with lightx2v loras. I might share a workflow but that model is already capable of 30s on 720p on a 4070ti so I'm not so motivated to do so. This one would still work if you found the missing packages. Thanks everyone for their support.
It finally happened!
Now there's a way for smoother continuous videos thanks to SVI team and Kijai.
We are at v1.0!
I've updated the workflow to add a few more features;
Video extend option by loading an initial video then converting it to latent that goes into first I2V(WIP)Option to switch between 3 and 2 ksampler phases by setting the initial step
Option to set cfg > 1 if you wanted to disable lightx2v
Images are saved partially in loseless format (use something like VLC to view them) and only loaded again on final merge, if something goes wrong you can merge those files to get a flowing video.
Implemented a bus system to reduce connections. Report if you have any issues but things should work as long as you have the right models and loras selected.
You can set and fix the seed for each part
There are options to upscale and interpolate before final save
Final save happens on main graph so you can preview your output
Slow motion issue probably persists. Couldnt find a consistent solution since when speed up using a third party tool every part becomes faster since they take previous latents as input until everything breaks.
Weak points of most SVI workflows right now is that it references first image in all parts so you might have background warping/chaning shape/textures on switches if the background has changed a lot.
I'll only be updating the workflow if kijai updates the node (there are two merge requests about end frame and better consistency(?)) and/or something breaks. So we can call this a semi final :)
Comfyui compatible SVI lora's;
LightX2V lora's I'm using;
FP32 vae:
Ultra Flux VAE for sharper !"Z Image"! outputs:
https://huggingface.co/Owen777/UltraFlux-v1/blob/main/vae/diffusion_pytorch_model.safetensors
GGUF still seems to be performing better than fp8 scaled in my experience.
Just share your outputs with us folks as well :)
v0.9
Left sampling on (1 + 3 + 3) steps with 4 parts (19s~). Takes around 10mins on my 4070ti with sage + torch compile. Feel free to extend it further if you need.
Everything is GGUF. Patch sage attention and torch compile are disabled by default but you are welcome to enable them back since they speed things up a lot if you have the environment set up.
You can set part specific or common lora's thanks to rgthree power lora node.
Happy generations! \('-')
Description
added svi support
FAQ
Comments (37)
everything is still too much slow motion . WAN 2.2 with Speed up lora slow motion problem sucks
yeah, same. the slowmo kills the entire point of using this.
My lightx2v loras are kinda outdated, I'll check if I can find the latest compatible ones.
Try and use the Seko loras. They're the ones giving me the best results.
Don't use the old lightx2v loras or the wan 2.1 lightx2v/lightening loras. Use the 1022 wan 2.2 low noise lora and the 1030 lightx2v high lora with a total of 6 steps it should help alot.
@MisticRain69 Which version exactly do you mean? 1022 is a Lora, while 1030 is a model.
Edit: Nevermind. Civit and Hugging have different versions.
@MisticRain69
I tried with your recommended loras and it didn't work, same slow video.
We need a painter node involved...
put it through frame interpolation and increase the output frames to like 38 or so. youl have to play with it.
the image resize node is set to stretch, you need to make it crop the image
other than that is actually working!
Will do thanks, I did not have any input images to test that :(
works beautifully! completely seamless video! although im using the latest lightx2 loras and the triple ksampler node
triple sampling issue I mentioned earlier could be a false alarm, custom lora's affect the quality a lot. will have to do more tests on the vanilla workflow.
So I ran the workflow without changing anything because chatgpt was saying that the error I was getting was because I must have done something and messed up the wiring but I still get a load of errors like the one below. Did anyone else have this issue?
Prompt outputs failed validation: CLIPTextEncode: - Return type mismatch between linked nodes: clip, received_type(MODEL) mismatch input_type(CLIP)
It seems you somehow stuck a model into the clip encode instead of clip. either you've picked the wrong thing or its a graph issue. Make sure to upload your comfyui
Cannot execute because a node is missing the class_type property.: Node ID '#82:41'
Thousand thanks for this!
The only things:
- The ZIT image creates a back view image of the soldier, but the video shows the soldier from front. Is it suppose to be like that?
- Every 5 seconds there is a change of perspective in the video, and I don't know why.
I'm using the default prompts that comes with the workflow.
What's the benefit of using two different High samplers per section?
Initial sampling has no lightx2v lora applied and cfg > 1, it gives an initial movement to the video to avoid slowmo and get higher motion out of lightx2v lora applied cfg == 1 samplings. I'll be stepping it up to 2 steps when I reupload the workflow.
Perfect transitions between the 5s parts and no visible degradation after 2 min video, so i'm pretty happy about the smooth result. But It seems to ignore the prompts after the first part, so the whole generation is like on autopilot (you get some unexpected and interesting results, sometimes even better than the initial plan).
Yep, I'm seeing similar things. It works for the first prompt, but after that it starts doing its own thing and very sensitive to the words you put in the prompt (despite not following it anyway). I wonder what causes that, will need to experiment a bit.
Its most likely due to way the latent is made and/or lightx2v. The node takes one reference latent from the very first start image on each step and whole latent of the previous generation. I dont know how it uses/mixes them. I'll try clipping the previous latent and see if it reduces the weird things but the fact that it takes motion-latent-count means that it could already be clipping but reference latent somehow could be suppresing the prompts tho its pure speculation.
u can increase the cfg as each extend goes by. if you have a more drastic movement on an extended section, u can increase the cfg to let that section more adhere to your prompt.
This is driving me crazy. I made a 19s video yesterday with this workflow, but I try the exact same thing today and it tells me it can't allocate 79GB of VRAM (no kidding!). I've tried everything I can think of.
Interesting, at what level does it say that? Can you reimport the workflow in case if something broke? Also I just uploaded v1. I'm not sure if its visible becase civit does not seem to be tagging the images. Can you try that and see if it works?
@iLegoLoon Its the first KSamplerAdvanced. "Tried to allocate 79.96 GiB. GPU 0 has a total capacity of 15.92 GiB of which 5.15 GiB is free." I'll get 1.0 and give it a whirl. I've downloaded the last version a few times to make sure I was using the right file and such.
@iLegoLoon 1.0 doing the same thing: `Tried to allocate 62.83 GiB`
Can you share the full log? What models are you using? Can it be related to your resolution or video length settings?
@iLegoLoon I checked the specs on the mp4 I managed to make yesterday, the resolution was same as I'm trying today, but the frame count on that file, divided by 4, means I must have lowered the length from 81 to 77 (whether on purpose or a fortuitious fat finger I can't recall). Doing so now (with 1.0) seems like it's chewing through the stages successfully, whereas it used to fail nearly immediately after getting to I2V-First. I'm guessing the fact that I'm using a 16G ROCm card means the overhead is just enough that it doesn't work with your defaults. Go figure. Maybe a note you can make for all us losers with AMD cards.
Last time I used an AMD gpu was 15 years ago :) You can also slightly lower the resolution and use simple resize upscale at the end. It wouldnt change much if you dont want to lose video length.
@iLegoLoon the difference of 4 frames per stage isn't much. I'll just add more stages when I need further video length.
Thats alright too, if it fails the merge you can merge part files put in temp folder manually using something like shotcut.
i tried updating and installing all the missing nodes, but still one of my nodes is missing -
WanImageToVideoSVIProin subgraph 'I2V-First'
I don't know which pack it is, but I have installed KJ nodes as it says in the properties panel of the missing node.
edit - i reinstalled and it still is missing!
edit 2 - deleted from the custom nodes folder, then git pulled it manually and it worked!