CivArchive
    LTX 2.3 Lip-Sync Workflow – 3min for 10s video, walk&talk supported - v1.0

     

    Click here to try online first:

     

    Workflow: Lip-Sync Speaking/Singing – LTX2.3 Image-to-Digital Human – Auto Expansion – Module Optimization – No Subtitles  

    Experience link: https://www.runninghub.ai/post/2038618856104665090/?inviteCode=rh-v1401

     

    Workflow: Text-to-Lip-Sync Video – Speaking/Singing – LTX2.3 Text-to-Digital Human – No Subtitles – Module Optimization  

    Experience link: https://www.runninghub.ai/post/2038618886479814658/?inviteCode=rh-v1401

     

    Workflow: LTX2.3 – Fully Automated Prompt – Text-to-Video  

    Experience link: https://www.runninghub.ai/post/2031218445026594817/?inviteCode=rh-v1401

     

    Workflow: LTX2.3 – Fully Automated Prompt – Image-to-Video – Modular Tuned Edition  

    Experience link: https://www.runninghub.ai/post/2031218459471777794/?inviteCode=rh-v1401

     

    Workflow: LTX2.3 – Fully Automated Prompt – First/Middle/Last Frame Three-Image-to-Video  

    Experience link: https://www.runninghub.ai/post/2035325465820405761/?inviteCode=rh-v1401

     

    Name: LTX 2.3 Image-to-Lip-Sync Meme Workflow (Modular / Ultra-Fast / Action-Supported)

    【Name】

    LTX 2.3 图生对口型鬼畜工作流(模块化/超快/支持动作)

     

    Introduction:

    Built on the open-source LTX 2.3 model, optimized for image-to-lip-sync videos. It allows any image (people/animals/medium-close-up) to accurately sing or speak along with the uploaded audio, while controlling actions (walking, waving, jumping, etc.) via prompts.

     

    【简介】

    基于LTX 2.3开源模型打造,专为图生对口型视频优化。可让任意图片(人物/动物/中近景)随着上传的音频精准唱歌或说话,同时通过提示词控制动作(走路、挥手、跳跃等)。

     

    Core Advantages:

    - Extremely fast: a 10-second 1280-resolution video takes only 3-6 minutes; even faster on second run

    - Batch 5x: tested running 5 workflows simultaneously, producing a dozen finished videos per day

    - Modular grouping: upload → dimension setting → audio → Latent creation → upscale; clear and easy to modify

    - With fixed shots, it's almost impossible to tell generated clips from original; perfect for memes/entertainment/vtubers

    - Supports MP3 audio (if error occurs, re-export once from CapCut)

    - Avoid prompts like "look down" or "turn around" as they break character consistency

     

    【核心优势】

    - 速度极快:1280分辨率10秒视频仅需3~6分钟,工作流第二次运行更快

    - 5开批量:实测同时跑5个工作流,一天产出十几个成品

    - 模块化分组:上传→尺寸设置→音频→Latent创建→放大,一目了然,易于修改

    - 固定镜头下几乎无法分辨生成与原片,适合鬼畜/娱乐/虚拟主播

    - 支持MP3音频(如遇报错,用剪映重新导出一次即可)

    - 避免提示词:低头、转身等会破坏人物一致性

     

    Workflow Structure:

    1. Upload image (medium/close-up, clear lip movements)

    2. Set dimensions (longest side 1280)

    3. Upload audio (10-15 seconds recommended)

    4. Latent module references both image and audio, scaling at the same time

    5. Final upscale and output

     

    【工作流结构】

    1. 上传图片(中近景,口型清晰)

    2. 尺寸设置(最长边1280)

    3. 上传音频(推荐10~15秒)

    4. Latent模块参考图片+音频,同时缩放

    5. 最终放大出片

     

    Results Showcase:

    This workflow has been used to create the "round-headed elderly meme singing" video (see example). Speaking lip-sync is equally excellent; paired with Qianwen voice design, it can be used for digital humans.

     

    【效果展示】

    已用本工作流制作“圆头耄耋魔性唱歌”鬼畜视频(见示例)。说话对口型同样优秀,配合千问声音设计可做数字人。

     

    Note:

    LTX2.3 is the open-source model closest to cinema-grade in texture and color control.

     

    【注意】

    LTX2.3是开源模型中质感、色彩控制最接近影视级的模型。

     

    Click here to try online first:

     

    Workflow: Lip-Sync Speaking/Singing – LTX2.3 Image-to-Digital Human – Auto Expansion – Module Optimization – No Subtitles  

    Experience link: https://www.runninghub.ai/post/2038618856104665090/?inviteCode=rh-v1401

     

    Workflow: Text-to-Lip-Sync Video – Speaking/Singing – LTX2.3 Text-to-Digital Human – No Subtitles – Module Optimization  

    Experience link: https://www.runninghub.ai/post/2038618886479814658/?inviteCode=rh-v1401

     

    Workflow: LTX2.3 – Fully Automated Prompt – Text-to-Video  

    Experience link: https://www.runninghub.ai/post/2031218445026594817/?inviteCode=rh-v1401

     

    Workflow: LTX2.3 – Fully Automated Prompt – Image-to-Video – Modular Tuned Edition  

    Experience link: https://www.runninghub.ai/post/2031218459471777794/?inviteCode=rh-v1401

     

    Workflow: LTX2.3 – Fully Automated Prompt – First/Middle/Last Frame Three-Image-to-Video  

    Experience link: https://www.runninghub.ai/post/2035325465820405761/?inviteCode=rh-v1401

     

    Name: LTX 2.3 Image-to-Lip-Sync Meme Workflow (Modular / Ultra-Fast / Action-Supported)

    【Name】

    LTX 2.3 图生对口型鬼畜工作流(模块化/超快/支持动作)

     

    Introduction:

    Built on the open-source LTX 2.3 model, optimized for image-to-lip-sync videos. It allows any image (people/animals/medium-close-up) to accurately sing or speak along with the uploaded audio, while controlling actions (walking, waving, jumping, etc.) via prompts.

     

    【简介】

    基于LTX 2.3开源模型打造,专为图生对口型视频优化。可让任意图片(人物/动物/中近景)随着上传的音频精准唱歌或说话,同时通过提示词控制动作(走路、挥手、跳跃等)。

     

    Core Advantages:

    - Extremely fast: a 10-second 1280-resolution video takes only 3-6 minutes; even faster on second run

    - Batch 5x: tested running 5 workflows simultaneously, producing a dozen finished videos per day

    - Modular grouping: upload → dimension setting → audio → Latent creation → upscale; clear and easy to modify

    - With fixed shots, it's almost impossible to tell generated clips from original; perfect for memes/entertainment/vtubers

    - Supports MP3 audio (if error occurs, re-export once from CapCut)

    - Avoid prompts like "look down" or "turn around" as they break character consistency

     

    【核心优势】

    - 速度极快:1280分辨率10秒视频仅需3~6分钟,工作流第二次运行更快

    - 5开批量:实测同时跑5个工作流,一天产出十几个成品

    - 模块化分组:上传→尺寸设置→音频→Latent创建→放大,一目了然,易于修改

    - 固定镜头下几乎无法分辨生成与原片,适合鬼畜/娱乐/虚拟主播

    - 支持MP3音频(如遇报错,用剪映重新导出一次即可)

    - 避免提示词:低头、转身等会破坏人物一致性

     

    Workflow Structure:

    1. Upload image (medium/close-up, clear lip movements)

    2. Set dimensions (longest side 1280)

    3. Upload audio (10-15 seconds recommended)

    4. Latent module references both image and audio, scaling at the same time

    5. Final upscale and output

     

    【工作流结构】

    1. 上传图片(中近景,口型清晰)

    2. 尺寸设置(最长边1280)

    3. 上传音频(推荐10~15秒)

    4. Latent模块参考图片+音频,同时缩放

    5. 最终放大出片

     

    Results Showcase:

    This workflow has been used to create the "round-headed elderly meme singing" video (see example). Speaking lip-sync is equally excellent; paired with Qianwen voice design, it can be used for digital humans.

     

    【效果展示】

    已用本工作流制作“圆头耄耋魔性唱歌”鬼畜视频(见示例)。说话对口型同样优秀,配合千问声音设计可做数字人。

     

    Note:

    LTX2.3 is the open-source model closest to cinema-grade in texture and color control.

     

    【注意】

    LTX2.3是开源模型中质感、色彩控制最接近影视级的模型。

    Description

    Workflows
    LTXV 2.3

    Details

    Downloads
    21
    Platform
    CivitAI
    Platform Status
    Available
    Created
    4/7/2026
    Updated
    4/7/2026
    Deleted
    -

    Files

    ltx23LipSyncWorkflow3min_v10.zip

    Mirrors