Here to try those workflows online and free downloading:
工作流:AA--数字人唱歌说话InfiniteTalk对口型图生视频
体验地址:https://www.runninghub.ai/post/2006334086171860994/?inviteCode=rh-v1401
# [AI Digital Human] Infinite Talk Lip-Sync Workflow – Easy & Open Source
## Overview
This workflow brings your static images to life using the open-source Infinite Talk model. It generates audio-driven lip-sync videos from a single portrait photo and any audio clip (speech or singing). Perfect for creating digital humans, talking avatars, or even song covers – all with minimal effort.
## Key Features
- 🆓 Completely free & open-source – powered by [Infinite Talk](https://github.com/your-link-here)
- 🎤 Audio-driven lip sync – just provide an image and audio, the model matches mouth movements perfectly
- ⚡ Easy to use – simple prompt (e.g., "person talking to camera") and default 25fps setting
- 🔧 Flexible duration – automatically calculates total frames based on audio; adds a safety margin (extra 1 second / 25 frames) to ensure the full audio is spoken
- 🖼️ High-res output – built-in upscaling option to enhance 540p results to crisp, clear videos
- 🔁 Multi‑scene support – combine with other workflows (e.g., Z image, Qianwen 2511 editing, Qianwen TTS) to create multi‑shot digital human videos with consistent characters
## How It Works
1. Upload your portrait image and audio file (WAV/MP3).
2. Write a simple prompt – keep it straightforward (e.g., "person talking to camera"). Complex camera movements are not supported.
3. Set frames – default 25fps is recommended. The workflow automatically adds an extra 25 frames to avoid audio truncation.
4. Run the workflow. The core sampling node processes four inputs:
- Model loader (adjust block swap if VRAM is low)
- Text embeddings (understanding the prompt & video dimensions)
- CLIP text (positive/negative prompts – use generic negatives)
- Audio embeddings (the uploaded audio, optionally truncated)
5. Output a lip-synced video. If you need higher resolution, feed the result into the included upscaler.
## Important Notes & Limitations
- ⏱️ Single‑clip duration – avoid generating videos longer than 1 minute in one go, otherwise color shift/artifacts may appear.
- 🎵 Audio length – for optimal results, keep audio under 60 seconds when running on cloud platforms (e.g., RunPod). On a local machine you can run longer sessions.
- 🧠 VRAM / memory – if you're running locally, increase your virtual memory and adjust the "block swap" in the model loader if you run out of VRAM.
- 🖼️ Source image quality – for best final clarity, start with a high‑resolution, clean portrait. The upscaler works better with good originals.
## Advanced Usage: Multi‑Scene Digital Human
To create a more engaging video with multiple angles or scenes of the same character:
1. Generate a character image (e.g., with Z image).
2. Use a storyboard workflow to get scene descriptions.
3. Edit the original image with Qianwen 2511 (or similar) to create consistent character variations.
4. Clone a voice using Qianwen TTS (keep it ~1 minute for best naturalness).
5. Run each image through this lip‑sync workflow.
6. Upscale and combine the clips.
The result is a professional‑looking lip‑sync video with seamless transitions and ultra‑clear visuals.
## Tags
digital human, lip sync, infinite talk, audio-driven, talking avatar, comfyui, workflow, open source, video generation, face animation
---
Get started now – it's literally "有手就会" (anyone can do it)!
Description
infinite talk