LTX-2.3 Whisper & Soft-Spoken Audio LoRA

LTX-2.3 Whisper & Soft-Spoken Audio LoRA - v1.0

LTX-2.3 Whisper & Soft-Spoken Audio LoRA

Base model: LTX-2.3 · Type: Audio-style LoRA · Rank: 32

---

## What this does

LTX-2.3 can generate dialogue, multi-speaker scenes, and full dynamic range audio including screaming — but it cannot whisper. This LoRA adds two quiet vocal registers to the model:

- Whispering — devoiced, breathy, close-mic delivery

- Soft-spoken — voiced but low-volume, intimate, relaxed

The LoRA targets only the three attention modules that write to the audio branch audio_attn1, audio_attn2, video_to_audio_attn). Video output is provably unchanged — no visual fighting, no style drift.

---

## Usage

Load at strength 1.0. The register is controlled entirely by the manner keyword in your prompt — no special strength tuning needed.

### Trigger words (none, use natural language)

| Whispering | (woman, whispering) | (man, whispering quietly) |

| Soft-spoken | (woman, speaking softly) | (man, speaking softly) |

> Note: Male whisper may requires the extra word quietly to tip the model over. (man, whispering) alone produces soft-spoken, not true whisper.

### Prompt format

Follow the LTX-2.3 dialogue caption style:

```

a [scene description], ([gender], [manner]): "[what they say]", intimate ASMR

```

Examples:

```

a woman sitting close to a microphone in warm dim lighting, (woman, whispering): "close your eyes and listen"

a man at a desk late at night, (man, speaking softly): "I've been thinking about this all day"

a woman doing a skincare routine, (woman, whispering quietly): "this is my favourite step"

```

### Without manner keywords

Using the LoRA without any manner keyword defaults to soft-spoken — a subtle volume-softening effect on whatever the base model would have generated. Useful as a gentle "quieter audio" modifier.

---

## What it can't do

- No intra-clip register mixing. You can't have one character whisper and another speak normally in the same clip. The register applies to the whole generation. For mixed-register dialogue, generate each part separately and cut them together.

- No magic above the vocoder ceiling. The audio chain passes through a mel spectrogram bottleneck. Breathy whisper HF energy gets partially smoothed. Expect intimate and quiet, not studio-crisp ASMR.

- Video is untouched by design. If you want the visuals to also feel ASMR (soft lighting, close-up framing), describe that in the scene prompt — the LoRA won't help or hurt.

---

## Training details

| | |

|---|---|

| Base model | LTX-2.3 dev |

| Steps | 2000 |

| Rank / Alpha | 32 / 32 |

| Target modules | audio_attn1, audio_attn2, video_to_audio_attn |

| Training resolution | 192×192, 97 frames (~4s @ 24fps) |

| Dataset | 74 clips, 8 voices (4F / 4M), 2 registers each |

Clips were 4-second segments sourced from ASMR content across 8 speakers — 4 female (2 soft-spoken, 2 whisper) and 4 male (2 soft-spoken, 2 whisper). Captions used Whisper ASR transcription in (gender, manner): "transcript", intimate ASMR format.

Description

78 audio clips spanning 8 voices, both male and female, supporting whispering and softly spoken audio.

FAQ

Comments (6)

kronos1959777Jun 14, 2026

CivitAI

Does this work combined with character loras?

plz12345

Author

Jun 14, 2026

I don't see why it wouldn't. It's purely audio-trained, so it didn't touch the video layer at all. However, it may fight with a character LoRA if that was trained with both video and audio.

OneBulletJun 14, 2026

you can also use "LTX2 Lora Loader Advanced" from Kj-Nodes. it lets you disable certain blocks (video, audio other) so you can prevent the lora from affecting i.e. video generation.

plz12345

Author

Jun 14, 2026

@OneBullet I don't actually use Comfy (MacOS here)

bennyboy_77Jun 14, 2026

CivitAI

Thanks so much. This is a much needed lora. I've only just started testing but, so far, it's working great. It seems like you can use a low strength e.g. 0.3 to create a soft neutral voice or crank it all the way up to 1.0 or above to go for the full hypnosis voice!

plz12345

Author

Jun 14, 2026

Yeah, I've heard it's very situation-aware, as well. Like if the subject is further away from the camera, the strength should be adjusted. Pretty wild that LTX didn't just support this without this kind of LoRA, but it was a neat process creating it.

LORA

LTXV 2.3

by plz12345

Download (Beta) View on CivitAI