CivArchive
    HuMo for Wan - HuMo 14B fp8 e4m3fn
    NSFW

    HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning


    ✨ Key Features

    HuMo is a unified, human-centric video generation framework designed to produce high-quality, fine-grained, and controllable human videos from multimodal inputs—including text, images, and audio. It supports strong text prompt following, consistent subject preservation, synchronized audio-driven motion.

    • ​​VideoGen from Text-Image​​ - Customize character appearance, clothing, makeup, props, and scenes using text prompts combined with reference images.

    • ​​VideoGen from Text-Audio​​ - Generate audio-synchronized videos solely from text and audio inputs, removing the need for image references and enabling greater creative freedom.

    • ​​VideoGen from Text-Image-Audio​​ - Achieve the higher level of customization and control by combining text, image, and audio guidance.

    Examples and models from the following sources reuploaded for your convenience here:
    https://huggingface.co/bytedance-research/HuMo
    https://github.com/Phantom-video/HuMo


    Compatible with both 480P and 720P resolutions. 720P inference will achieve much better quality.

    Description

    Checkpoint
    Wan Video 14B t2v

    Details

    Downloads
    7
    Platform
    SeaArt
    Platform Status
    Available
    Created
    9/13/2025
    Updated
    9/13/2025
    Deleted
    -

    Files

    Available On (1 platform)

    Same model published on other platforms. May have additional downloads or version variants.