CivArchive
    LTX-2.3 Whisper & Soft-Spoken Audio LoRA - v1.0

    LTX-2.3 Whisper & Soft-Spoken Audio LoRA

    Base model: LTX-2.3 · Type: Audio-style LoRA · Rank: 32

    ---

    ## What this does

    LTX-2.3 can generate dialogue, multi-speaker scenes, and full dynamic range audio including screaming — but it cannot whisper. This LoRA adds two quiet vocal registers to the model:

    - Whispering — devoiced, breathy, close-mic delivery

    - Soft-spoken — voiced but low-volume, intimate, relaxed

    The LoRA targets only the three attention modules that write to the audio branch audio_attn1, audio_attn2, video_to_audio_attn). Video output is provably unchanged — no visual fighting, no style drift.

    ---

    ## Usage

    Load at strength 1.0. The register is controlled entirely by the manner keyword in your prompt — no special strength tuning needed.

    ### Trigger words (none, use natural language)

    | Whispering | (woman, whispering) | (man, whispering quietly) |

    | Soft-spoken | (woman, speaking softly) | (man, speaking softly) |

    > Note: Male whisper may requires the extra word quietly to tip the model over. (man, whispering) alone produces soft-spoken, not true whisper.

    ### Prompt format

    Follow the LTX-2.3 dialogue caption style:

    ```

    a [scene description], ([gender], [manner]): "[what they say]", intimate ASMR

    ```

    Examples:

    ```

    a woman sitting close to a microphone in warm dim lighting, (woman, whispering): "close your eyes and listen"

    a man at a desk late at night, (man, speaking softly): "I've been thinking about this all day"

    a woman doing a skincare routine, (woman, whispering quietly): "this is my favourite step"

    ```

    ### Without manner keywords

    Using the LoRA without any manner keyword defaults to soft-spoken — a subtle volume-softening effect on whatever the base model would have generated. Useful as a gentle "quieter audio" modifier.

    ---

    ## What it can't do

    - No intra-clip register mixing. You can't have one character whisper and another speak normally in the same clip. The register applies to the whole generation. For mixed-register dialogue, generate each part separately and cut them together.

    - No magic above the vocoder ceiling. The audio chain passes through a mel spectrogram bottleneck. Breathy whisper HF energy gets partially smoothed. Expect intimate and quiet, not studio-crisp ASMR.

    - Video is untouched by design. If you want the visuals to also feel ASMR (soft lighting, close-up framing), describe that in the scene prompt — the LoRA won't help or hurt.

    ---

    ## Training details

    | | |

    |---|---|

    | Base model | LTX-2.3 dev |

    | Steps | 2000 |

    | Rank / Alpha | 32 / 32 |

    | Target modules | audio_attn1, audio_attn2, video_to_audio_attn |

    | Training resolution | 192×192, 97 frames (~4s @ 24fps) |

    | Dataset | 74 clips, 8 voices (4F / 4M), 2 registers each |

    Clips were 4-second segments sourced from ASMR content across 8 speakers — 4 female (2 soft-spoken, 2 whisper) and 4 male (2 soft-spoken, 2 whisper). Captions used Whisper ASR transcription in (gender, manner): "transcript", intimate ASMR format.

    Description

    78 audio clips spanning 8 voices, both male and female, supporting whispering and softly spoken audio.

    FAQ

    Comments (6)

    kronos1959777Jun 14, 2026
    CivitAI

    Does this work combined with character loras?

    plz12345
    Author
    Jun 14, 2026

    I don't see why it wouldn't. It's purely audio-trained, so it didn't touch the video layer at all. However, it may fight with a character LoRA if that was trained with both video and audio.

    OneBulletJun 14, 2026

    you can also use "LTX2 Lora Loader Advanced" from Kj-Nodes. it lets you disable certain blocks (video, audio other) so you can prevent the lora from affecting i.e. video generation.

    plz12345
    Author
    Jun 14, 2026

    @OneBullet I don't actually use Comfy (MacOS here)

    bennyboy_77Jun 14, 2026
    CivitAI

    Thanks so much. This is a much needed lora. I've only just started testing but, so far, it's working great. It seems like you can use a low strength e.g. 0.3 to create a soft neutral voice or crank it all the way up to 1.0 or above to go for the full hypnosis voice!

    plz12345
    Author
    Jun 14, 2026

    Yeah, I've heard it's very situation-aware, as well. Like if the subject is further away from the camera, the strength should be adjusted. Pretty wild that LTX didn't just support this without this kind of LoRA, but it was a neat process creating it.

    LORA
    LTXV 2.3

    Details

    Downloads
    621
    Platform
    CivitAI
    Platform Status
    Available
    Created
    6/13/2026
    Updated
    6/16/2026
    Deleted
    -

    Files

    a_gentle_whisper.safetensors

    Mirrors

    HuggingFace (1 mirrors)
    CivitAI (1 mirrors)