CivArchive
    Wan 2.2 Video + Voice + Motion Control All-In-One workflow optimized for RTX 3060 12 GB VRAM GPU - v1.2
    NSFW

    [edit: 23.01.2026 use last version v2.0 now (see version description)

    Workaround for small isue in v2.0 with audio part: Go to the bottom right of the ‘01 Audio...’ group and simply move the ‘Any Switch’ node from the subgroup ‘01.1.3’ to a free area in the ‘01’ group and make sure the node is not bypassed.

    I will fix this in the next version].

    Special thanks to:

    @boinobin730 for lot of testing, sharing knowlage and pushing this project 🙂

    @SeoulSeeker for sharing his knowlage and giving the first crucial hints.

    Features:

    This workflow uses InfiniteTalk to generate videos of a talking/singing person/objekt. The resulting video is guided/controlled by a start image, an audio source (speech/voice/song) and a control video to guide the general movement. I designed it as an all-in-one workflow. You just need a start image and/or optional audio/video source.

    - Works perfect on RTX 3060 with 12 GB VRAM and 32 GB RAM + large swap file (min. 64 - 128 GB).

    - Easy installation (all necessary models linked).

    - Easy to use via switch options.

    - High Quality outputs.

    The workflow includes 4 simple steps:

    1. Audio generation or load,

    2. Video generation or load for DWPose motion control,

    3. InfiniteTalk: generates the final LQ video output (guided by DWPose and audio syncronised,

    4. Upscaling and framrate multiplying for smoth HQ outputs.

    Videos of around 5 seconds working well. Longer videos (around 10 seconds) are possible, but you might run quickly into known video issues, like looping movements, OOM errors, etc.

    This workflow is quite advanced now - I would say in an early beta status. Everything should work technically. So I believe it is a good basis for more advanced tests and hopfully some fun 🙂

    My intention is to integrate the Step Audio EditX Engine for easy to use advanced audio control via tags soon as possible. But actually there are some issues with the corresponding nodes.

    A next step might be the integration of camera control.

    Attention:

    This workflow is intended for advanced comfyui users. Even installation and usage should be simple, this workflow is actually a basis for testing and developing and you might need some comfyui knowlage to use it. Please understand, I will not give basic installation and comfyui support here.

    If you are a beginner with vido generation and more complex workflows, I would recommend you my other workflow for video generation. This one has been well tested and is allready much better documented and commented.

    About the basics:

    This workflow based on official templates and different allready published workflows. I just put different parts together, created a hopefully easy-to-use “design” and optimized everything for 12 GB VRAM.

    Description

    • Switch option added: You can now choose between simple audio generation or uploading an existing audio file,

    • audio/video syncing with cut off first video frames fixed in step 04

    • extended and corrected documentation

    FAQ

    Comments (69)

    arkinson
    Author
    Jan 19, 2026· 1 reaction
    CivitAI

    @boinobin730 and all others here. I´m pretty sure I found the right tools to generate most flexible local sound that can be easily controlled by "tags". It needs just the "Step Audio EditX Engine" and "TTS Text" nodes from the "tts_audio_suite". Unfortunately, I can't run it on my end because I get an error message. (I will open a Github issue about this). If you have the time and are interested, please take a look at it, see my “Audio test workflow” published here. This is an excerpt from the official GitHub workflow. Models downloaded automatically. You just have to install the "tts_audio_suite" via manager. I tested it on 2 different comfyui sytems, but no luck to get it running 😕

    boinobin730Jan 19, 2026

    Thanks Arkinson. I will have a look in the next few days and see if I can help.

    arkinson
    Author
    Jan 21, 2026· 1 reaction

    @boinobin730 Good news so far: the creator of the tts_audio_suite has reacted very fast at my issue at github an published a fix. Unfortunately, this currently leads to a new error... So please, be patiant.

    arkinson
    Author
    Jan 23, 2026

    @boinobin730 New workflow out here - version 2.0. I added text-2-song for audio generation. I think this is a very funny option and can generate high quality songs out of the box 🙂I also added an option to load existing videos for motion generation.

    Unfortunately the Step Audio Edit Engine has stll issues. I hope the creator will fix it. Testing and integrating would be the next step then....

    boinobin730Jan 24, 2026

    @arkinson Wow. Thank you. This is awesome. I can't wait to try. I noticed previously that if my dialogue in vibevoice read like a song it had a really difficult time processing. Doing it this way might make it sound a lot better. Great stuff.!!

    boinobin730Jan 24, 2026

    @arkinson I upgraded to .10 comfyui yesterday. It was a lot easier than I expected. I want to run Flux -2 Klein. It looks impressive.  

    arkinson
    Author
    Jan 24, 2026

    @boinobin730 Flux-2-what-the f**k???? 🤣 Oh my, that AI stuff runs much to fast. Thank you so much for the hint. I just had a short look. Lot of confusing new stuff at first and lots of new models: base/destilled, quantezised, 9B, 4B, clips and vae`s 🙄 and I never really got the basics 😅 Do you know wich is "better" to use - base or destilled?? Will Flux 2 work with the "old" Flux 1 Loras??

    boinobin730Jan 24, 2026

    @arkinson I actually don't know. It took forever to load all the nodes from a workflow. I haven't checked it out this morning(for me). I need to test it and see. Development is so fast. I think Klein is the cut down Flux 2 model, which was too big for us to be really useable. I never got into Flux except for Inpainting and fixing hands. Flux excels at hand fixing.

    boinobin730Jan 24, 2026

    @arkinson I am trying to run your new workflow but I am not sure where i should be getting. Do I get the ACE-Step file from here? https://huggingface.co/ACE-Step/ACE-Step-v1-3.5B I can't see the safetensors file.

    arkinson
    Author
    Jan 24, 2026· 2 reactions

    @boinobin730 Simply click on the model link in the workflow (all models linked, see top left corner) 😉

    arkinson
    Author
    Jan 24, 2026· 1 reaction

    @boinobin730 Flux -2 Klein: my very very quick first test with 4b models (template workflow t2i):

    - Quality: can`t see much difference to Flux 1 dev,

    - Prompting: not much tested yet,

    - runs without issues with 12 gb vram,

    - base model: (20 steps) takes long, seems not to genearate nsfw,

    - destilled model: runs very fast (4 steps only), seems to work with nsfw prompts,

    - Flux 1 Loras not working ☹️🥵 Oh my - this is really a no go.

    boinobin730Jan 24, 2026· 1 reaction

    @arkinson you are so quick!!! I am still generating on your workflow atm. I am interested in what Klein can do with consistent character, inpaints, face swaps etc. The fun never ends!!!!

    arkinson
    Author
    Jan 25, 2026

    @boinobin730 Hi, I did some more serious side-by-side tests between Flux1 dev and Flux2 Klein. You can use the destilled flux-2-klein-9b-Q4_K_M.gguf model with 12 gb vram (Q8 works too, but is completely at the limit if you add a Lora).

    Speed is really awesome - longest part is upscaling. Image quality out of the box is not so well like Flux 1 dev but much better then z-image. The Klein model mostly generates the same image if you do not change the prompt (like z-image do). To generate different "characters" you have to do detailed prompting. Prompt following seems to be better then Flux 1 dev in special situations. It is very interesting. Some usecases working much better, but others simply don`t work. I believe it will be great for nsfw too - it just need completely new Loras... Generating sweaty skin works out of the box for example 😅

    If you are interested in a simple testworkflow t2i with upscaling, just give me a hint.

    I`m actually looking for an suitable Lora trainer (hope that will run with 12 gb vram). But information is still very rare....

    boinobin730Jan 26, 2026

    @arkinson Interesting findings. Thanks for your run down. I have been playing with it. Using other peoples workflows. I used a really simple one and the results are very promising. I generated a simple gym scene and then placed the character from image 1 into the image 2 gym and the results were quite good without loras. It even kept body dimensions relatively the same. I tried another workflow to generate a multishot of 1 character, front, back, side and it had very mixed results. The Qwen results were much better. Possibly I didn't have enough steps. You are right, it is so quick. Great for just general images. It is still early days but definitely promising. I also have been trying to get to work a LTX2 workflows. Early days as well but looking a bit better. I need to play with it more. It also seems to be a Work in progress for the developer as well. https://civitai.com/models/2304098/ltx-2-19b-gguf-12gb-comfyui-workflows-5-total-t2vi2vv2via2vta2v

    arkinson
    Author
    Jan 26, 2026· 1 reaction

    @boinobin730 Thank you for the link. The video examples looking brilliant and audio is perfect. You've allready convinced me - i will give it a new try 😉

    boinobin730Jan 26, 2026

    @arkinson Let me know how you go. I was trying to get the LTX2 Image with Audio to work (similar to your sound audio clip workflow) to work but I got a static image with the sound clip on top of it. The general I2V works ok, I need to keep testing. It looks very complicated and I think the developer is going to make some documentation hopefully.

    arkinson
    Author
    Jan 26, 2026· 1 reaction

    @boinobin730 Yeah - I did some first tests. I have to disable the preview node in all workflows, even I did a fresh installation of the desired custom nodes as mentioned in the comments. But this is only a small issue.

    With t+a2v I got the same outputs like you - (static image + sound) especially in portrait format and the video quality was not so well. But I did not tested much. LTX2 seems to need other prompting then Wan.... There is 1 comment with the same issue using portrait, but the suggestion to use a special aspect ratio only did not work in my case.

    Next I tested v2v and this is absolutely crazy and worked out of the box - look here. This stuff is too cool 🤣 just extending an existing video with sound and motion - and quality is really well without any upscaling and multiplaying so far. I believe this will have a lot of ptential...

    I will have a look at i+a2v now. Did you tryed landscape or portrait? Maybe better prompting may help.

    If we get running i+a2v too, I will see if we can speed up generation a little bit with lower frame rates and final upscaling... Lot to do 🙂

    boinobin730Jan 26, 2026

    @arkinson Thanks for checking it out Arkinson. I'm glad i'm not the only one getting static images, I tried changing a few settings and changing the prompts and it did seem to get a better response but I can't fathom exactly what it prefers. That is a super smooth continuation. I haven't tried this yet but is amazing what it can do.

    arkinson
    Author
    Jan 26, 2026

    @boinobin730 Ok, same issue here with i+a2v so far, even with landscape. A hint from the comments, to use a "camera" guiding Lora helps a little bit and generates lip syncing at least. But general movement is really odd. Will try to get deeper in...

    boinobin730Jan 26, 2026

    @arkinson I think it is an audio issue. Basic I2V is fine for video., but sound was muted. Previously an I2V example gave me sound. It is way too advanced for me. I don't even understand how it all works. I think I will try some other peoples LTX2 Workflows to get a better understanding on how its put together.

    boinobin730Jan 26, 2026

    @arkinson Someone helped me. Camera control Lora needs to be added. I tried this one and it worked like a charm. https://huggingface.co/Lightricks/LTX-2-19b-LoRA-Camera-Control-Static funnily enough its not on civitai but it does work. The other person used the dolly in. This helps a lot as we have a base line to work from.

    boinobin730Jan 27, 2026

    @arkinson me again. Workflow author said this lora is good as well. https://huggingface.co/MachineDelusions/LTX-2_Image2Video_Adapter_LoRa/tree/main It also seems to do a good job, i dropped all strengths down to .5 that also included. ltx-2-19b-distilled-lora_resized_dynamic_fro09_avg_rank_175_fp8.safetensors as well as the ltx-2-19b-ic-lora-detailer.safetensors loras. I don't get the last image frame output though. Its a broken image icon. Do you get the last frame?

    boinobin730Jan 27, 2026

    @arkinson i added a video helper suite select image node to get the last frame in the end. There are also new loras coming out for flux 2 Klein. I'm excited about this consistence lora https://civitai.com/models/1939453?modelVersionId=2634354 as even out of the box Flux 2 Klein is very accurate with body and face consistence somewhat.

    arkinson
    Author
    Jan 27, 2026

    @boinobin730 Oh my- 5 minutes not at the desk and here goes the world around 😂 I have to read now.... Thank you so much.

    arkinson
    Author
    Jan 27, 2026

    @boinobin730 I run the tests with the additional ltx-2-19b-lora-camera-control-dolly-in.safetensors Lora yesterday. As mentioned, the output was not satisfying.

    I2V workflow: I used your linked ltx-2-19b-lora-camera-control-static.safetensors Lora now. With the destilled model ltx-2-19b-distilled_Q4_K_M.gguf I get blurry/pixelated outputs only. Using the standard model I get roughly the same results as with the dolly-in Lora: some slow motion + completely non syncronised lip movements and pretty poor video quality. No comparison to the V2V outputs.

    Could you give me a hint wich model + wich bundle of Loras you used finally?

    arkinson
    Author
    Jan 27, 2026

    @boinobin730 Uhh - just found some speculations and maybe workarounds for the slow/non motion issue here.

    boinobin730Jan 27, 2026

    @arkinson All models are the same I think the most important is to use the dev model gguf. ltx-2-19b-dev_Q4_K_M.gguf . I have 3 loras daisy changed off each other in the lora group at strength of 0.5. These loras in order. ltx-2-19b-distilled-lora_resized_dynamic_fro09_avg_rank_175_fp8.safetensors then ltx-2-19b-ic-lora-detailer.safetensors then LTX-2-Image2Vid-Adapter.safetensors all these loras are at strength 0.5 I did a 25 second clip it took 30 minutes. I am going to post it to the examples.

    arkinson
    Author
    Jan 27, 2026

    @boinobin730 OK, I tried the same model + Loras (except the detailer Lora) with no luck. And you use the I2V workflow? Yes would be helpfull to see your example.

    I am just checking/testing another low vram workflow I found. Let`s see....

    boinobin730Jan 27, 2026

    @arkinson  i dont know if you can download my workflow here. try it. https://limewire.com/d/tvAOj#SJWftpKge4

    boinobin730Jan 27, 2026

    @arkinson I posted my example but there is always safety check delays. It will probably be there soon. 

    arkinson
    Author
    Jan 27, 2026

    @boinobin730 Thank you. I just saw your sample video and loaded your json. Ok, we talked about different workflows 🙄 You are on the i+a2v workflow. I did a few short tests yesterday. This workflow generally works (lip-sync) and you get some few very simple motions. Yes it can be a little bit improved with the additional Lora like you did allready. I´m quite impressed, that you got a long clip generated in one run.

    I and obviously others too have the troubles with the i2v workflow, wich generates the speech by itself, like the v2v workflow do. I`m really impressed by the capebillities of v2v and still hope to get it running with i2v too.

    Btw. Did you got the preview running with taeltx_2.safetensors??

    boinobin730Jan 27, 2026

    @arkinson no worries. general i2v was no problem for me. It always seemed to work out of the box. Yes. taeltx_2 worked for me but it was being referenced in the easy comfyui folders not in my models folder. Even though i fixed the yaml it just wouldn't pick it up. I just copied the folder to the easy install folder and it worked. I couldn't be bothered in trying to make it perfectly referenced. I was surprised it went to 25 sec. I did it before i went to sleep and expect an OOM. It did it in 30 minutes. I will try longer and see how far it goes with either being too long or OOM.

    arkinson
    Author
    Jan 27, 2026

    @boinobin730 I2V: oh my, I got it running finally with your linked LTX-2-Image2Vid-Adapter.safetensors Lora (without the Lora = static video). I don`t know what I did wrong before, cause I tested exactly this several times. And yes, the preview works now too - I accidentally saved the file in the vae folder. I believe I need much more sleep 😅 Thank you for your help.

    I2V is really cool. Have to test much more. Using 16 fps for faster generating did not work. But I did some minor "tweaks": sliders, automatic frame and aspect ratio calculation, final upscaling and framerate multiplying.... I`m not sure yet, but it could be interesting to combine the different workflows in one.

    And back to the actual topic: the creator of the tts_audi_suite is very helpful and tries to solve the issue.

    boinobin730Jan 28, 2026

    @arkinson Excellent. I'm glad you got it to work. I can't believe how rapid the development is. Everyday there is some new model release, new workflow, new lora, new efficiency node. Damn!! its like Christmas never ends. I'm glad you are getting response with that audio package. There was a new Qwen TTS released a day or so ago. I couldn't load it, but people reckon the voice lacked emotion. Zimage base looks like it released as well. I 've only played with it a little.

    I look forward to your version of the LTXv2 workflow. You seem to be able to get the maximum juice out of the model for us 12GB people.

    arkinson
    Author
    Jan 28, 2026

    @boinobin730 You are right - five month ago we didn`t know "what a video is" and now we let get the dolls talking, dancing, singing, laughing and jumping 🤣🙂 I'm afraid my old Wan workflow is now just a pile of rubbish 😥

    I did a 20 seconds test with I2V too and it runs like a charm (did not used more vram then a short clip). Upscaling and multiplying tooks the most of the time. I did not ploted the generation time, but curiously it tooks not much longer then a 8 second video. I saw this behavior of untraceable generation times several times.

    I will do some tests with much higher start resolutions, maybe the final upscaling is not necessary.

    Btw. speech control with I2V is absolutely crazy. I tested a little bit with voice control, but even timing is incredible: you can let the doll speak something while moving, then let her make a wired face and then let her talk some completely different nonsens 🤣 I love this stuff 🙂

    boinobin730Jan 28, 2026

    @arkinson Your workflows are not rubbish now. They are still useful especially since Wan has a lot of NSFW loras behind it. It will take a while for LTX2 to catch up. Ltx 2 has got a lot of potential as long as its easy to make loras it will remain a strong contender in the future. But there are always new competition, I just wonder if it will ever slow down or it gets to the stage where we can literally dictate and control exactly what we want to see.

    arkinson
    Author
    Jan 28, 2026

    @boinobin730 Oh, come on, Wan is out now 🙃

    I did some tests with high resolution generation. It`s unbelieveble: 1280 x 864 runs fast as hell and without any OOM issues. So upscaling is completely unnecessary. A test with 1024 x 704 and 30 seconds length took about 40 minutes. The longest part was vae decode. The tiled decoder is very slow, seems it moves a lot of GB to batch file. Will test if the simple decoder can handle it without OOMs.

    I have allready combined T2V and I2V in one workflow and handling looks easy so far. I think it makes sense to put all the main parts together. And it's great that everything works with the same models.

    Did you found something with ltx2 and first/last frame to video?

    boinobin730Jan 28, 2026

    @arkinson I was trying to get frames to line up. there seems to be a delay in voice matching especially if you cut off mid sentence. Now that we can go to 20+ seconds input I don't think it will be a problem as I just need to cut at natural language pauses. I haven't test extensively lately, I am going on holiday soon and preparing. I wont be able to use Comfyui let alone a computer. So I will be gone for 2 weeks from Monday. I wonder what will happen in the AI field when I'm gone?

    arkinson
    Author
    Jan 29, 2026

    @boinobin730 What the hell - holidays without a computer??🤣 Sounds like a survival training 🙄 Last time without a computer I spent in Newfoundland, trying to get a ride to Goose Bay - Labrador 😂

    I wish you nice holidays and maybe time to read a good book instead 🙂

    boinobin730Jan 29, 2026

    @arkinson Ha. Ha. not as bad as that but yeah its the things we do to keep the wife happy. I'm going to New Zealand to check out the north Island. Lots of driving, eating, sight seeing, very relaxed, just no computer . I looked up Goose Bay Labrador , wow that looks remote. I have never been to Canada. Thanks, I will talk soon.

    arkinson
    Author
    Jan 31, 2026· 1 reaction

    @boinobin730 Going to New Zealand just to keep the wife happy - I love it 🤣 I've never been there, but it sounds exciting. And as the Canadians say: "Take care boy!" 🙂

    arkinson
    Author
    Jan 31, 2026· 1 reaction

    @boinobin730 New LTX-2 all-in-one workflow published now 🙂

    boinobin730Feb 12, 2026· 1 reaction

    @arkinson I just got back . We had a marvelous time at NZ. I think If I could, I'd swap living in Australia to living in New Zealand at times. But they are both similar countries. I can see you are doing great work again. I knew you would make an Arkinson special tweaked LTX2. I haven't tried it out yet but I will over the weekend.

    arkinson
    Author
    Feb 13, 2026· 1 reaction

    @boinobin730 Hey mate - you survived the wilderness??? 😂 I wasn't expecting you before the weekend. If you think afterwards that you could actually live in another country, then it sounds like a damn good trip 👍 I know that feeling, even though I've never sailed across the equator 🙂

    boinobin730Feb 13, 2026

    @arkinson It was, thank you. One day you may find yourself travelling far. New Zealand and Australia are the perfect travel destinations. So much scenery, places, buildings, culture. I took lots of shots of the countryside on my potato phone camera. I am now in the process of using qwen image with those photos and having my usual fun. I am then going to put the images through your new LTX2 workflow. I will post results and give you any feedback that I might have.

    boinobin730Feb 13, 2026

    @arkinson So how's life in your half of the equator? Are you working on any new AI projects?

    arkinson
    Author
    Feb 13, 2026

    @boinobin730 Good idea to combine your travel pictures with AI. And yes, let`s see some of your results 🙂

    Meanwhile on the northern hemisphere?? Don`t ask, mostly bad news - "as usual" over the last month and years... Yes, unfortunately 😣

    Ok, something funny: after struggling several weeks on github with the creator of the tts_audio_suite to get Audio EditX running, I needed just half an hour to figure out it consumes extreme much VRAM and needs longer to generate three simple sentences then a 10 second ttx-2 video + audio 🙄 So I burried this idea. On the other hand, ltx-2 is very good to generate speach with cool/funny pronunciation, even with any "exotic" language you might think of. I did a short test for fun here.

    As planned I will add the text + audio 2 video part in the ltx-2 workflow soon. This should be no problem. And maybe more interesting: somewhere I found a first/last frame 2 video workflow for tlx-2. I did not test it yet, but we shall do 😉

    text 2 song: I found a much better workflow then published in the Wan2 workflow (faster and much better audio quality with a few steps). I love it to "compose" "own" songs and music and use it in the video parts. I believe, I will "outsource" the audio part from the Wan workflow to a seperate workflow, so it is useble with ltx and wan at one place...

    boinobin730Feb 13, 2026

    @arkinson sorry to hear about that, I hope things get better. sounds interesting about your new endeavours. First frame last frame on LTX would be useful. I look forward to see what you produce. Over the last week developers have released a new Ace Step. for song generation. https://huggingface.co/ACE-Step/Ace-Step1.5 Apparently it is getting better over time and someone did a merge of the turbo model and base model. https://huggingface.co/Aryanne/acestep-v15-test-merges/blob/main/acestep_v1.5_merge_sft_turbo_ta_0.5.safetensors . I haven't tested yet but its on my to do list for AI. The big corps went for the jugular on music generation killing Udio and now Suno is also seeing big changes. So the only real solution is to generate locally. It can only get better.

    boinobin730Feb 13, 2026

    @arkinson check out this hyperlink for some of my doctored travel pictures. https://civitai.com/posts/26581432 I am not sure if the link works for you as my pictures aren't coming up on the gallery. I used this qwen workflow https://civitai.com/models/2386462?modelVersionId=2683492 and the consistency lora. I honestly don't even need my Pony lora to generate simple posed images.

    arkinson
    Author
    Feb 13, 2026

    @boinobin730 Yes, I use the ace step 1.5 turbo model in the new workflow. Song/music generation works very well. I`m still trying to generate speach or at least a capella singing.

    arkinson
    Author
    Feb 13, 2026

    @boinobin730 Ahh, thank you for the qwen link. Will try this soon, Did you ever found any usefull t2v workflow for Qwen and 12 gb vram? Or could your linked workflow be modified?

    Just saw your pictures. I feel like I know that lady 🤣 It's really hard to tell whether it's a photo or AI.

    boinobin730Feb 14, 2026· 1 reaction

    @arkinson I honestly havent tried T2V on Qwen as I am fixated atm on consistency. The tools have now gotten so good that you literally could take 1 photo and get a consistent SFW model pose from just that 1 photo. No LORA necessary. The workflow uses this massive AIO checkpoint that surprisingly works really well on 12gb VRAM just a bit slow to get an output. But since I am then using that image as a source image for LTXV2 video its worth it.

    NSFW I think is more challenging as you won't get consistent nipples for examples when she disrobes. Hence the need for a chracter lora. Yes. She gets around. I just now need to find a distinct unique voice for her and then she is complete. I will create a back story for her, flesh out her personality and I will have her as a main character. As to the photos, they were real life photos without anyone in it. Its actually easier to slip a character into an empty shot than it is to prompt that qwen workflow to say remove the person in image 2 and place the person in image 1. That last photo where she is sitting at a table about to eat the Wendy's burger was actually me sitting at the table. I couldn't replace myself in the workflow, so I just said remove the person in the photo and I was gone. I then used that empty shot and prompted the person in image 1 is sitting behind the table in image 2. Viola. It worked. Although it reduced her boobs a bit. Anyway. It's all fun and games at the moment. I then had her speaking a few lines and eating the burger with your LTX workflow. It worked well for 2 tries and then it slowed down. So I have to work out what went wrong. The workflow is very solid. Yes a last frame image would be so useful.

    boinobin730Feb 14, 2026· 1 reaction

    @arkinson Ok. So I spent the whole day, enjoying the fruits of your workflow. Specifically I2V. a few things to keep in mind. If you prompt girl, you will get a high pitch almost kid like voice. see the example with my girl with the hat on pointing to the sign above her. She even acts a lot more kid like when you write girl. As soon as I refer to her as young woman, her gait changes, her demeanor changes and obviously the voice changes.

    Ltx2V has problems with characters turning. Literally had to retry multiple times to get her to turn without body horror morphing. I feel that if you give the generation more time, it can turn her body a lot easier.

    Natural speech has pauses between statements. So I just go .... between sentences to give it some speech pauses. The last generation was a doozy. LTXV2 does not naturally know what a penis is. I needed a LTXV2 Penis lora. Again I had massive problems with her turning around and then turning back to the camera. I think there is a LTXV2 prompt structure that I should be respecting. so That it is easier for the model to accurately reflect the users wishes. Anyway. It is a great workflow as always. Thank you for making it. I will play with the music version tomorrow. talk soon.

    arkinson
    Author
    Feb 15, 2026

    @boinobin730 Hi - yesterday I woked up and "half" of my images/videos was banned by the bots. Uhh - sometimes this is really frustrating with civitai.... 🙄

    I haven't found the time yet to look into the qwen workflow, but I'll definitely take a closer look at it. The possebility to get a character/style from a single image sounds good. Would be very interesting to have it with Flux.1. "Long" ago I tried something like ip adapter (can`t remember exactly) with flux, but finally it did not work...

    Ltx-2: Over the last days I tried out as many different concepts/ideas as possible. And yes, it's probably the same as always: some ideas working out of the box with brilliant or really unexpected results (especially with t2v), but others don't work at all, no matter what you try. Animating my graphics seems incredibly difficult for example. On the other hand, someone on reddit used my workflow to let`s talk his own dog some funny stuff via v2v 😂

    Thank you for your hint with "..." to get a pause in speach. I tested "-" what didn't work. And yes, I have also noticed that ltx-2 is sometimes very sensitive to prompting. T2V and "...he speaks in plain englisch..." came across as really funny.

    Actually I try to organise my comfyui. I just installed Comfyui-Lora-Manager and Prompt-Manager. I hope it works and isn't too buggy finally. What I am really missing too is a usefull workflow manager...

    boinobin730Feb 15, 2026

    @arkinson Yes, I heard about the banning of certain images on Civitiai. Is it because there is no prompt ?

    LTXV2 I think is still a bit early in LORA and workflow development. Wan has Wan animate that allows you to mimic the actions of a character and replace it with a character of your own choosing. Great for Tik Tok dance type stuff. I have looked but I cannot find this on LTXV2 yet.

    It's great that people use your tools and create some funny stuff. Do you have a link to the reddit talking dog? I haven't tried V2V yet. Do you think it's possible to get an output of I2V from LTX2 and then feed that in as an input for a V2V? I wonder if you get some crazy town type stuff happening. I need to test more.

    arkinson
    Author
    Feb 15, 2026

    @boinobin730 I had this banning for several times. Some of the pictured women might be looked too young. In other cases it seems to be the prompt: one wrong word and you are out. Btw. that`s the reason I use mostly fake prompts instead. But in most cases it is not understandable.

    V2V: I found the reddit video again: here. V2V works very well. One of my quick tests was this one. Start video was Wan22, then V2V. It dosn`t matter where the start video comes from, as long as the quality is good and resolution should be not under resize resolution of course.

    boinobin730Feb 15, 2026

    @arkinson Ohh I remember seeing this video when I was on holidays. It is very funny. I had no idea it was from your workflow. Good job. Crazy how good we can get on limited local resources. Your kiss video is great. I wonder how long we can keep generating from small snippets of video or if the video fidelity starts to break down over time.?

    I haven't had any banning yet, I guess woman with big chest aren't considered children. If I did images of women with flat chest, perhaps the ban hammer might be enforced. I know they haven't got an AI bot for sound yet, as my swearing girl WAN video is still up.

    arkinson
    Author
    Feb 16, 2026

    @boinobin730 The probability of your stuff being banned increases exponentially with the amount of published material 😅

    V2V: If you have a look at the "kissing" video you can clearly see that the faces getting quickly distorted after heavy movements.

    Completely offtopic: I got deeper into Comfyui Lora-Manager. This is one of the most professional tools for comfyui I have ever found. It organizes and automates everything around Loras and Checkpoints till to completely automatic metadata generation for Civitai. If you have more then "two" Loras this is definately a solution 🙂

    boinobin730Feb 16, 2026

    @arkinson I have been playing around with LTXV2 more I can see what you mean about the distortion, especially with fast movement such as a dance. I generated a few I2V videos and prompted awkward dance and the last frame of the face is quite terrible. I will add it to the gallery for you to see. I tried to then clean it up by putting the last frame into a qwen workflow so that I could then use it again for another go but its not really a satisfactory output. I'm pretty sure its because its fast action.

    Throw me a link to the lora manager. ? I might be using it already, I have a lora manager but I am not sure if I am getting full potential out of it.

    I was wondering if I could show you a workflow and ask your opinion on how they got it to work? It is a Wan animate workflow that can go over a minute. It's good but I am not sure how they managed to get it to work for so long and the output is fair to average. If you have time to look. It does work on 12gb Vram .https://civitai.com/models/2018097/wan-animate-v2-unlimited-duration

    boinobin730Feb 16, 2026

    @arkinson my videos wont go to the gallery. Must be because copyright songs. https://civitai.com/posts/26655378 is this the maximum output for movement we can expect on our 12gb vram cards? i am going to try some movement type loras and see if it improves. i think its because of the speed of movement of the character. especially hands. Its too fast.

    boinobin730Feb 16, 2026

    @arkinson Yeah. further to my own understanding, LTXV2 still has limitations even if you had a large VRAM heavy card. This was a reddit post, I didn't read. Output is fair even though its from a RTX 6000 with 96Gb. https://www.reddit.com/r/StableDiffusion/comments/1q9cy02/ltx2_i2v_quality_is_much_better_at_higher/

    I think I just have to be patient and not get too greedy and curb my over enthusiastic expectations.

    arkinson
    Author
    Feb 17, 2026

    @boinobin730 Just use comfyui manager and search for: "comfyui-lora-manager".

    I2V: I can open your video, but it is without sound. At the first view it looks not too bad, but you are right - hands and face getting out of control 🙄

    We had a similar discussion about sound issues on the workflow page. I believe motion issues for i2v or v2v depending actually mostly from the start image/video, prompting and video length and if you try to force the model in a certain direction. Testing camera control loras might help, but I`m afraid if you find a solution for one usecase/concept it will not work for another and vise versa....

    Uhh - I just saw, we posting parllel. Just a few words to the wan workflow. I had a short look at the youtube video. Unfortunately it is not understandable what they are doing there. And Wan is out 😆😂

    arkinson
    Author
    Feb 17, 2026

    @boinobin730 I just have read your linked article. Very interesting. And yes of course, we are completely on the low end limit. But I must say I am quite happy with the quality we get out and hey - two weeks ago it was unbelievable to generate something usefull longer then 8 seconds 🙄 The hint with the landscape format might be important. I have not tried it yet.

    And yes: "Follow LTX-2 prompting guidelines closely" seems often to do the magic. But I struggle a lot for myself with it.....

    arkinson
    Author
    Feb 17, 2026· 1 reaction

    @boinobin730 Btw. using T2V seems the most easy part and I am mostly pretty happy with the outputs - even if they do not follow the prompt 😂

    boinobin730Feb 17, 2026

    @arkinson all good. I like looking at all the new stuff and the new models and workflows. I was trying to run the video output through an upscaler but at the moment its garbage in garbage out. I haven't seen you post at these hours of the day/night. Must be early for you.

    boinobin730Feb 18, 2026

    @arkinson So, I put aside my quest for better I2V generations. I just installed this. https://civitai.com/models/2400306/ltx-2-easy-prompt-by-lora-daddy You might like it. basically it pimps your prompts, so if you are bad a prompt writing, it will breathe a bit of life into the prompt and supposedly give you a better output. It's definitely interesting and provides a lot of entertainment in what sort of generation you can get from the LTXV2 model. I stuck it into your workflow and works really well. I will post some examples in the gallery for reference. I just tested it more . It is giving me more variation in the output, mostly for the better.

    arkinson
    Author
    Feb 18, 2026

    @boinobin730 I don`t know if you saw my new post at the ltx-2 model page. Please let us talk there about ltx-2 (and any thing else of course).

    arkinson
    Author
    Jan 20, 2026· 1 reaction
    CivitAI

    Troubleshooting comfyui issues. @boinobin730 and all others here.

    Yesterday, my Comfyui Easy Install suddenly stopped working too (various error messages, including “swap file too small,” program crashes, etc.). Unfortunately, I can't remember if/what I had changed before. But I got the system and the video+audio workflow up and running again with the following steps:

    1. GPU driver update.

    2. Comfyui-Easy-Install update via bat file.

    3. Update all custom nodes via manager.

    4. Consistently uninstall all custom nodes that cause conflicts.

    5. Manually set the Windows swap file on a fast SSD: min = 64000 MB, max = 128000 MB.

    6. Start comfyui via run_nvidia_gpu.bat and not via run_nvidia_gpu_SageAttention.bat (as described by boinobin730).

    Perhaps not all steps are necessary, but this worked for me.

    Other
    Other

    Details

    Downloads
    244
    Platform
    CivitAI
    Platform Status
    Available
    Created
    1/18/2026
    Updated
    4/30/2026
    Deleted
    -

    Files

    wan22VideoVoiceMotionControlAll_v12.zip