See about for v1.2 loader fixes.-->
My output quality is suffering because I haven't switched to prores or png intermediary files yet. That's were the red shift is coming from, repeated h264 compression, a lot of processes are still separate because I'm still figuring out memory management with these gigantic SVI runs. RAM becomes a bottleneck now, even 128GB fills up fast, so still no upscale/interpolate on the end. It's fine if it's just the transition but if you're merging it with the source as the last step, no. Still working on the end of the loop, beginning is perfect I think. I thought about encoding backwards with an overlap, but I think maybe just going forward over the source start frames as if extending, but really locking in the motion might work. If I can match it perfectly then a fade into that would probably work. That's what I'm trying at the moment. Getting there. Come on, you have to admit that anything's better than a jarring jump cut every three seconds. No matter how pretty the picture is.
I continue my quest to make infinite loops truly seamless. Progress is slow at times, backwards at times. SVI is a wonderful nudge in the right direction. The incoming motion is, for all intents and purposes, solved. The other end still needs work, but there is huge improvement here in both motion and color.
OK so here’s how this works. SVI needs latent frames to do its thing and preserve incoming motion. So we encode the last 16 frames of the end of the source video and use them for our prev_samples. We use one frame of the start of the source as our anchor. The embeds we get from the SVI node are then combined with the embeds from a WAN encoder that also takes a look at our source, but just one frame of each end. Thus armed with a start frame, an end frame and some special magic embeds, we infer. Now, the results aren’t exactly what I was hoping for straight out of the decode, but it’s close. The main issue is that we always get a few frames of garbage at the end. With the native node, this happens when you’re naughty and don’t follow the frame rules, ie 49, 73, etc. (multiple of 4, +1). Now, it turns out that SVI actually messes with the frame count, so if you tell it you want 73 frames, you actually end up with 77. Which is a naughty number. Alas, when we try to be clever by asking for 69 frames, although we’ve successfully fooled it into making 73 frames, we still get garbage at the end. So let’s get cleverer. Is what I said to myself. Since we can’t avoid the garbage, let’s make it predictable. Ergo the duplicator that’s in the setup. We tack a 4-frame hold onto our source, essentially telling it to slow down the motion early. And since in this case we can correlate latent frames with non-magical regular ones, we simply chop four frames off of our eventual output, where we know the garbage is to be found. That make sense? I know what you’re thinking, the extra frames are on the wrong end. Right, wrong. Which turns out right, don’t ask me why, alright?
So, having sorted out motion, for the most part, we turn our attention to color. As I should know, the color of magic is octarine, which of course is unavailable in any node of which I am aware. So we use mkl color matching. But there’s a catch. We have to match not just the beginning, but also the end of the video. Because as we know, it likes to drift. Even moreso now that, with SVI, a base generation is somewhere around 300 frames. Enter the fade mask. We match our video twice, once to the beginning and once to the end. The mask gives us a nice crossfade from our video matched to the end and the video matched to the beginning. Of course sometimes one widget is not enough. Hence the grading group. I have found that I generally only need it at the end, so there is only one group, bypassed by default. You’ll have to work out your own adjustment based upon the light your balls receive. Which can be deceptive, but that’s the way it goes. Estimating corrections is difficult with eyeballs. *For the non-lazy, who are willing to venture outside of comfy to achieve an effect, tricky wicked pickles are greatly aided by a histogram and a vectorscope. Look at the luma histogram in comparison view or superimposed and try to squash or stretch your luma using an equivalent to the available widgets, eg. contrast, value, offset, etc. Same with color, using color vectors. Of course if you’re pulling your outputs into an editor you may just want to color correct there. But it is helpful to make a few presets based on the corrections in your editor, translated to comfy nodes. For steering it in the right direction.*
I’ve been doing my testing with https://civarchive.com/models/2053259?modelVersionId=2477539. It’s been the most obedient. There are distillation LoRAs in the loader, not to be used with pre-distilled models, of course. SVI Pro settings are those I found to work best.
On the combine end, you’ve got a choice of saving just the bridge or saving the whole shebang merged. Or both. I’m leaving the interpolation and upscale stages out for the moment. Memory management is becoming very sticky now that we have feature-length generation, and upscaling and interpolizing the 300 or so frames plus the new FLF bridge is not fun for the average thing, that thing inside the box that glows and blows hot air. The microchip ram thing.
I started out with the goal of tacking my existing FLF onto the end of the multi-stage SVI wf, but this gave me major headaches, probably because I was trying to port it to wrapper nodes at the same time. So I decided to first get the port done and adapt it to use SVI by itself before adding to the main build. If it doesn’t integrate that well, I’ll post the SVI workflow by itself. I’ve made enough modifications to justify it I think, mostly automated switching for selecting prompt sets with an index. Helpful when you’ve got four stages of prompting to do and they need to go well with each other. Also consolidated model/sampler settings widgets- with so many stages it's a nightmare to keep jumping into subgraphs to change stuff. But ultimately it’s a generic staged setup, with four stages, puts out around 300 frames.
There are two versions in the .zip. One of them uses my custom node that allows you to extract a path and a filename from a drag and drop video loader, it attaches to the widget on a path loader. I made it because S&R isn’t working for me at the moment, don’t know why. And you really need to be able to connect your incoming filename for this process if you don’t want to get lost, or bored typing it in every time. If node referencing is working for you, you don’t need it, obviously. If you want to use the nodes, just drop the folder into your custom nodes folder and restart comfy. No dependencies needed, it’s just a few lines of python.
Only have one example at the moment...should have made more. Well, I've got tons, but I'm not one of those peoples who feels the need to upload every piece of garbage they make. I select my garbage. There has to be something in it that required more than a click. And it has to have sound. Nothing more useless that a video without sound. Maybe sound without video. That would probably suck too. Point is, I'm way more interested in the technique, making these tools what I want them to do, than I am in making something dazzling. The more invisible, the better. Which is why SVI is so great. Almost there. So close to seamless. I can taste it.
Description
v1.2 Fixes some obvious loader issues. I put in regular loader, if you want to use them, just move whatever is connected to the path loaders over to them. If you are working with already upscaled sources and you need to downscale them, I have set up the scaling nodes such that all you need to do is enable them (they’re normally bypassed) and unhook the scaling subgraph from the ‘video info’ node. That well let you manually enter a global resolution.
FAQ
Comments (20)
where i can get DragVideoPath, there is no Repo or anything else.
It's in the zip folder. Also on my models page.
Where exactly is the 'WanVideoCombineEmbeds' node located? I can't find it no matter how hard I search
@ohddugi0516816 ComfyUI-WanVideoWrapper repo (it does have a 'BETA' tag on it, but I didn't do anything special). If the other nodes from that repo seem fine, try deleting/moving the directory for it in the custom_nodes directory and then git pull the repo into the custom_nodes directory. I had a problem with the node right before that one and this solution worked to re-establish everything.
Location in the latest version is #587, Group 'SVI Embedding', directly above the 'End' Group. The Nodes Map menu doesn't search in groups, at least not for me, so a search won't show it unless you go to the group submenu first. In case you're talking about finding the location of the node onscreen. It's not minimized.
such a question. Is it possible to make the same workflow that would glue the finished pieces of video? I. e. For example, I created a lot of videos using the FLF workflow. If they are connected using third-party editors, the seams are visible. Now, if this magical Lora made such transitions invisible, there would be no price for her. Or are there workflows with this Lora but only FLF?
I'm not sure exactly what you are asking... this is a workflow, not a LoRA. The only LoRA that is required is SVI. This is not a one-clicker, it's a tool. Like I said in the description, the end frames are still not quite there. The magic is working great for the beginning, there should be no problems with this seam. I put in the option to save the merged video because this. For the other end I would recommend a crossfade of a few frames. Duplicating two or three frames on both ends can help keep this smooth if the motion is too fast. This is not currently part of the WF process, as it's not something that is always required. Automating a perfectly polished finished comp isn't really really feasible, at least not yet. I am still working on the end>start part. There is indeed no price for her; I am doing her for free.
@Ponder_Stibbons I mean, your workflow is making a loop. And I wish there was no loop (video looping). I have 10 videos of 5 seconds each. I would like to combine them so that the seams are not visible and the video does not loop. Thank you. Sorry if I didn't write it clearly, English is not my native language.
@dirtysem Yes, I assumed it was translated, no worries. https://civitai.com/models/2261474/long-videos-with-svi-22-pro-wanvideowrapper-workflow
If you want to make long comps use this. All of the videos posted to this workflow were created with my own variation of this basic workflow. The setup is basically the same, my own version adds a stage for a total of four (81x4 minus the overlaps), instead of three. It worked great for me out of the box, though. Also the pre-distilled model that is listed for my posts worked really well for me. I do intend on posting my version, I was hoping to add this one as a final stage, but I'm worried that it would be unusable for most people. The memory requirements are not for the faint of heart. I will probably post the 4 stage SVI WF by itself.
If you want to join different videos with this workflow, just put them into the loaders and disable the merging combiner, save the transition video only. There are two sources available in the newer version.
@Ponder_Stibbons tell me what exactly to disable, so that you can simply combine the video without looping. I can't find it "just put them into the loaders and disable the merging combiner, save the transition video only".
@dirtysem I'm not sure what to tell you besides the obvious. There are two sources of video if you are using v1.2. There is nothing to disable, just use two different videos. I put it upload nodes exactly for this, with notes. This isn't rocket surgery. I'm trying to help but if this is a matter of basic comfy proficiency I'd suggest a tutorial on how to use comfy. Or upload the whole workflow to an LLM as a text file. Other than that, it's just a matter of saving the transition, not the merge. I can't really make that any plainer. Just dive in and screw around if you're not super familiar with the interface. It's the best way to learn, just smash stuff together until it works.
it's really weird, but none of your WF can be imported in comfyUI on my end :/ it always says that there is no workflow in the files, but they look okay in notepad :/ had you anyone else have this problem ?
Update comfy to latest version. Make sure frontend is updated. Good opportunity to make sure your backups are up to date as well. I pity the fool who don't back up his comfy.
There is no workflow metadata embedded in their videos, so what you are seeing is expected.
You are an eccentric genius and I look forward to your future endeavors.
That is just the sweetest thing ever said to a psychotic pervert. This is why clowns wear makeup; you cannot see them blush.
The transitional motion is just incredible where continuity is concerned. Completely exceeded my expectations. One problem: the generated frames are a bit... fucked. They are very grainy and jittery, especially by comparison to the input video, which it should ideally match, of course.
Any idea what the cause might be?
Yes and no, depending on where the garbage is showing up. I consider the incoming part solved, for the most part. That is to say the takeover part, the start of the video, comes out perfect for me. The incoming motions is solved by SVI, and the color correction fixes the color/luma jump. The garbage frames come on the end. There are nodes that chop off most of the junk, but the very end is where bad frames will show up, and only there, if it's working properly. My plan was to run another generation in reverse (frames running backwards), but that creates a new problem of matching. So the end still needs a little manual tlc.
If all of your output looks bad, this is a different problem. I can only speak for the models that are in the wf default- so it could just be the models you are using. The best I can say without knowing exactly what you are using is that whatever examples I've posted to the model used the defaults, as far as the models, LoRAs, vae, encoder, and the basic settings for those. I think default is a pre-distilled model, with no lightning lora. I might be able to say more if I knew exactly what you were using, but maybe not. There are so many friggin variables with this stuff.
I'll probably get back to improving this at some point... there's just a new damn model every five minutes to play with. It's very hard to stay on one thing.
@Ponder_Stibbons The whole thing, alas, and I am indeed using a model without the step distilled lightx loras baked in, so I add them to the lora loader manually.
Testing continues today. I believe the grainy scunge I mentioned before has something to do with using an input video of insufficient length for the WF's requirements, specifically where obtaining all the frame-grab precursors is concerned. That input video was 2 seconds in total (66 frames in a 32 FR vid).
I think this is the case because I am testing on a video twice that length today and am happy to report the near total absence of the aberrant, grainy quality in the frames generated with this input length.
One remaining problem: the generated portion now looks sped-up. The rate of motion in the generated portion of the video looks roughly twice as fast as the input video. I take this as a clue indicating that I have a frame-rate mismatch somewhere along the line between what the workflow is doing (which seems to be normalizing to 16fps) and the 32 frame rate input video I am using, but I'm not smart enough to intuit what I would need to change in the workflow to resolve that. I could also just be totally off the mark, which is a terrifying prospect given the rat's nest of variables that could be at play.
If you have an inkling as to what the solve might be here I'd be eternally grateful.
One last thing at the top of mind: I've been adding the SVI loras to the loader myself, but I see you don't have them loaded in that loader up near the top of the grid in the packaged json. Are the SVI loras already being pulled into the workflow elsewhere?
@crocusflowerparadigm Sorry, missed your reply at first, just noticed it. As far as the SVI loading goes, it's entirely possible that I deleted them from the LoRA loader node when I was cleaning everything up for the post. No one has mentioned this, but yeah they would just go alongside any others in the hi/lo loaders, along with the lightning models in your case.
You could definitely get garbage if there are not enough frames, but it would have to be one hell of a short video, as it's set to only use 16 frames (I think, or something close) to use for the encode. Or maybe if there were a few crappy frames in what it did grab. I can't remember the exact reason for setting it like that, might just be one of those black-box issues where you have to feed it some seemingly random number or it refuses to work.
The speed issue can be addressed by adding some more interpolation in there. The interpolation that is already there is just for smoothing a few frames. You can duplicate that one and insert it after the transition has run through all of its cleanup and whatnot, right before it is merged with the original. The multiplier will depend on how it looks, of course. Rife only does integers, but a flownet32 node can handle something like 1.5x for example, if you need to really fine tune it. Let me know if you have trouble finding the right spot for it and I can take a closer look when I have access.
For my part, I really only use the merged loop as a preview- something I can package with the frames folder so I know what I'm working with when it's time to edit. So the time-ramping, crossfades, and other adjustment isn't really addressed here in a comprehensive way. It's just within the transition itself, rather than addressing the actual stitching. That's not to say you can't get a polished final result out of the combiner, I just never trust any raw output. Most of the time it takes five seconds to clean up manually stuff that would require 80 new nodes and a bunch of convoluted logic to automate (or perhaps knowing what I'm actually doing, which is seldom the case).
I know it's probably a copout to say to just drop the junk frames and throw in a two or three frame crossfade, already having experience editing, not to mention the software in which to do it, maybe makes it sounds glib. I have done tons of experimentation with automating, or at least porting, the basic cleanup stuff to comfy, but it always misses something that you can fix with even the most basic software in a minute. In addition to the fact that a new model comes out every other day, and I get distracted playing with the latest slop machine.