LTX2 All-in-One ComfyUI Workflow - CivArchive (CivitAI Archive)

My approach: I generate everything in "Fast" mode first to quickly iterate and find the best results, then I selectively upscale only the videos worth keeping by reusing the same seed.

Generation Modalities

📝➡️🎥 Text-to-Video Create completely new videos from scratch using text prompts.
🖼️➡️🎥 Image-to-Video Animate static reference images using text prompts.
🖼️🖼️➡️🎥 First-Last-Frame-to-Video Generates a coherent video sequence that bridges a defined starting and ending image.
🎥➡️🎥 Video-to-Video Generates synchronized audio and speech driven by video visuals. Offers options to pass through or entirely regenerate the source video.
🎥 Vid2Vid (Masked Face) Face-targeted refinement using Segmentation V2 (RMBG) to automatically mask the facial region. Preserves the original video while regenerating only the face area for enhanced lip-sync quality. (Note: Best results occur when the source video already shows speaking-like movements)

Audio Input Settings

🔇 No Audio Input No external audio file used. The AI generates new audio based on your text prompt.
🔊 Audio Input Upload an existing voice or music file to drive the animation and lip-sync.

🚀 VRAM Optimization & Long Videos

This workflow uses ComfyUI_LTX-2_VRAM_Memory_Management (https://github.com/RandomInternetPreson/ComfyUI_LTX-2_VRAM_Memory_Management)

ℹ️ copy the folders ComfyUI_LTX2_SeqParallel and comfyui_tensor_parallel_v3 into your ComfyUI\custom_nodes directory.

⚙️ Chunking Settings for Longer Videos

Adjust ffn_chunks based on your video length (at 24fps):

10 seconds (240 frames): ffn_chunks=1-2
15 seconds (360 frames): ffn_chunks=2-4
20 seconds (480 frames): ffn_chunks=4-6
25 seconds (600 frames): ffn_chunks=8-10
33 seconds (800 frames): ffn_chunks=12-16

If you encounter OOM (Out of Memory) errors:

Increase ffn_chunks value
Reduce resolution slightly

NEW: 🎥 Vid2Vid (Masked Face)

Face-targeted video refinement that applies a mask specifically to the facial region using automatic segmentation. This mode preserves the entire original video while exclusively regenerating the face area for enhanced lip-sync precision and facial animation quality. Perfect for when you want to improve lip movements and facial expressions without altering the rest of your video content. (Note: Achieving natural speech animation can still be challenging - best results occur when the source video already shows speaking-like movements)

CHANGE: 🎥 Vid2Vid (Vid Bypass)

You can now downscale the video input (e.g., to 0.6 megapixels) for sound generation while the original video is always passed through in its native resolution.

Generation Modalities

Audio Input Settings

🚀 VRAM Optimization & Long Videos

⚙️ Chunking Settings for Longer Videos

Description

Details

Files

ltx2AllInOneComfyui_ltx2DistilledAIOV21.zip

Mirrors

ltx2AllInOneComfyui_ltx2DistilledAIOV22.zip

Mirrors