These workflows are the tools I use to create multilingual e-commerce digital human promotion videos. The entire production process is online, free, and easy to follow. For details on the sequence and specifics of using these workflows, please refer to the detailed introduction in the video below:
Brief Tutorial:
1. Use Zimage to generate model images and scene images that match the appearance of people in foreign regions, as well as scene images that fit the product and setting.
2. Use Qwen image edit 2511 to merge the model, product, and scene images to generate the base image for the digital human.
3. Use Qwen TTS to generate advertising voiceovers in foreign languages.
4. Use the infinite talk digital human workflow to generate lip-sync videos of the digital human.
5. Upscale the resulting digital videos to obtain 4K ultra-high-definition lip-sync videos of the digital human.
The storyboard design workflow uses Gemini Flash to generate scene prompts and image editing prompts.
The image processing workflow is used to upscale images to 4K to add details.
Tutorial: How to Get Digital Human Avatars for Free
Log in to Runninghub and get 1,000 RH credits. Enter the invite code rh-v1401 to get another 1,000 RH credits.
Because RH standard mode consumes 12 credits per minute when running, generating one image with Zimage costs only 12 RH credits, and a 30-second voice clip also costs around 12 credits. Using Qwen2511 for image editing takes about 3 minutes per image and costs around 40 credits. Running a 30-second digital human video takes 30 minutes and consumes about 360 credits. Upscaling a 30-second video in segments takes about 35 minutes and consumes about 420 credits.
When generating digital human videos, write good prompts. Simple character movements and dynamic background descriptions usually work on the first try, and upscaling also generally works on the first try (for long videos over 20 seconds, there is a very small chance that a few seconds may be cut off, so it's recommended to generate in 15-second segments for upscaling).
Cost for creating a four-scene English digital human shopping video (final length: 50 seconds):
Zimage image generation: 1 model image, 4 scene images; approximately 60 credits.
Qwen2511 image editing: 6 times; approximately 216 credits.
Voice generation: 4 segments, about 4 minutes total; approximately 50 credits.
Digital human videos: 4 segments totaling 50 seconds; approximately 600 credits.
Upscaling 4 video segments: approximately 720 credits.
Total around 1,650 credits. The 2,000 free credits are more than enough! If you're just practicing with digital humans, you can skip upscaling the videos, which allows you to experiment many more times!!