CivArchive
    Preview 117866097

    My approach: I generate everything in "Fast" mode first to quickly iterate and find the best results, then I selectively upscale only the videos worth keeping by reusing the same seed.

    Generation Modalities

    • 📝➡️🎥 Text-to-Video Create completely new videos from scratch using text prompts.

    • 🖼️➡️🎥 Image-to-Video Animate static reference images using text prompts.

    • 🖼️🖼️➡️🎥 First-Last-Frame-to-Video Generates a coherent video sequence that bridges a defined starting and ending image.

    • 🎥➡️🎥 Video-to-Video Generates synchronized audio and speech driven by video visuals. Offers options to pass through or entirely regenerate the source video.

    • 🎥 Vid2Vid (Masked Face) Face-targeted refinement using Segmentation V2 (RMBG) to automatically mask the facial region. Preserves the original video while regenerating only the face area for enhanced lip-sync quality. (Note: Best results occur when the source video already shows speaking-like movements)

    Audio Input Settings

    • 🔇 No Audio Input No external audio file used. The AI generates new audio based on your text prompt.

    • 🔊 Audio Input Upload an existing voice or music file to drive the animation and lip-sync.

    🚀 VRAM Optimization & Long Videos

    This workflow uses ComfyUI_LTX-2_VRAM_Memory_Management (https://github.com/RandomInternetPreson/ComfyUI_LTX-2_VRAM_Memory_Management)

    ℹ️ copy the folders ComfyUI_LTX2_SeqParallel and comfyui_tensor_parallel_v3 into your ComfyUI\custom_nodes directory.

    ⚙️ Chunking Settings for Longer Videos

    Adjust ffn_chunks based on your video length (at 24fps):

    • 10 seconds (240 frames): ffn_chunks=1-2

    • 15 seconds (360 frames): ffn_chunks=2-4

    • 20 seconds (480 frames): ffn_chunks=4-6

    • 25 seconds (600 frames): ffn_chunks=8-10

    • 33 seconds (800 frames): ffn_chunks=12-16

    If you encounter OOM (Out of Memory) errors:

    • Increase ffn_chunks value

    • Reduce resolution slightly

    Description

    NEW: 🎥 Vid2Vid (Masked Face)

    Face-targeted video refinement that applies a mask specifically to the facial region using automatic segmentation. This mode preserves the entire original video while exclusively regenerating the face area for enhanced lip-sync precision and facial animation quality. Perfect for when you want to improve lip movements and facial expressions without altering the rest of your video content. (Note: Achieving natural speech animation can still be challenging - best results occur when the source video already shows speaking-like movements)

    CHANGE: 🎥 Vid2Vid (Vid Bypass)

    You can now downscale the video input (e.g., to 0.6 megapixels) for sound generation while the original video is always passed through in its native resolution.

    Workflows
    LTXV2

    Details

    Downloads
    935
    Platform
    CivitAI
    Platform Status
    Available
    Created
    1/18/2026
    Updated
    2/1/2026
    Deleted
    -

    Files

    ltx2AllInOneComfyui_ltx2DistilledAIOV21.zip

    ltx2AllInOneComfyui_ltx2DistilledAIOV22.zip