LTX 2.3 Lip-Sync Workflow – 3min for 10s video, walk&talk supported

Click here to try online first:

Workflow: Lip-Sync Speaking/Singing – LTX2.3 Image-to-Digital Human – Auto Expansion – Module Optimization – No Subtitles

Experience link: https://www.runninghub.ai/post/2038618856104665090/?inviteCode=rh-v1401

Workflow: Text-to-Lip-Sync Video – Speaking/Singing – LTX2.3 Text-to-Digital Human – No Subtitles – Module Optimization

Experience link: https://www.runninghub.ai/post/2038618886479814658/?inviteCode=rh-v1401

Workflow: LTX2.3 – Fully Automated Prompt – Text-to-Video

Experience link: https://www.runninghub.ai/post/2031218445026594817/?inviteCode=rh-v1401

Workflow: LTX2.3 – Fully Automated Prompt – Image-to-Video – Modular Tuned Edition

Experience link: https://www.runninghub.ai/post/2031218459471777794/?inviteCode=rh-v1401

Workflow: LTX2.3 – Fully Automated Prompt – First/Middle/Last Frame Three-Image-to-Video

Experience link: https://www.runninghub.ai/post/2035325465820405761/?inviteCode=rh-v1401

Name: LTX 2.3 Image-to-Lip-Sync Meme Workflow (Modular / Ultra-Fast / Action-Supported)

【Name】

LTX 2.3 图生对口型鬼畜工作流（模块化/超快/支持动作）

Introduction:

Built on the open-source LTX 2.3 model, optimized for image-to-lip-sync videos. It allows any image (people/animals/medium-close-up) to accurately sing or speak along with the uploaded audio, while controlling actions (walking, waving, jumping, etc.) via prompts.

【简介】

基于LTX 2.3开源模型打造，专为图生对口型视频优化。可让任意图片（人物/动物/中近景）随着上传的音频精准唱歌或说话，同时通过提示词控制动作（走路、挥手、跳跃等）。

Core Advantages:

- Extremely fast: a 10-second 1280-resolution video takes only 3-6 minutes; even faster on second run

- Batch 5x: tested running 5 workflows simultaneously, producing a dozen finished videos per day

- Modular grouping: upload → dimension setting → audio → Latent creation → upscale; clear and easy to modify

- With fixed shots, it's almost impossible to tell generated clips from original; perfect for memes/entertainment/vtubers

- Supports MP3 audio (if error occurs, re-export once from CapCut)

- Avoid prompts like "look down" or "turn around" as they break character consistency

【核心优势】

- 速度极快：1280分辨率10秒视频仅需3~6分钟，工作流第二次运行更快

- 5开批量：实测同时跑5个工作流，一天产出十几个成品

- 模块化分组：上传→尺寸设置→音频→Latent创建→放大，一目了然，易于修改

- 固定镜头下几乎无法分辨生成与原片，适合鬼畜/娱乐/虚拟主播

- 支持MP3音频（如遇报错，用剪映重新导出一次即可）

- 避免提示词：低头、转身等会破坏人物一致性

Workflow Structure:

1. Upload image (medium/close-up, clear lip movements)

2. Set dimensions (longest side 1280)

3. Upload audio (10-15 seconds recommended)

4. Latent module references both image and audio, scaling at the same time

5. Final upscale and output

【工作流结构】

1. 上传图片（中近景，口型清晰）

2. 尺寸设置（最长边1280）

3. 上传音频（推荐10~15秒）

4. Latent模块参考图片+音频，同时缩放

5. 最终放大出片

Results Showcase:

This workflow has been used to create the "round-headed elderly meme singing" video (see example). Speaking lip-sync is equally excellent; paired with Qianwen voice design, it can be used for digital humans.

【效果展示】

已用本工作流制作“圆头耄耋魔性唱歌”鬼畜视频（见示例）。说话对口型同样优秀，配合千问声音设计可做数字人。

Note:

LTX2.3 is the open-source model closest to cinema-grade in texture and color control.

【注意】

LTX2.3是开源模型中质感、色彩控制最接近影视级的模型。