CivArchive
    Text-to-Speech with Voice Clone in LTX 2.3 - v1.1
    Preview 126545198
    Preview 126545204

    Use LTX 2.3 as a Text-to-Speech (TTS) model.

    Note: As it was pointed out in the comments below, there is an outstanding bug in ComfyUI that may prevent you from using the new voice cloning node in this workflow. See the discussion on github for more information.

    This workflow is designed to generate audio only output from a prompt and a speech sample. The generated audio will clone the sample voice and apply it to the specified dialogue and prompt. If you want to make more than one video with a consistent character voice, then you can't rely on LTX's random voice assignment, so this workflow will give you a consistent voice that you can use to create whatever spoken script you want.

    Voice cloning is made possible with the ID-LoRA models and new ComfyUI node, "LTXV Reference Audio". In my experience, the video generated with this LoRA isn't very reliable or high-quality, so I had better results from creating an audio file and applying that pre-made audio track to a new LTX 2.3 video based on a starting image. Those results haven't been 100% perfect either, but the success rate was higher than any other method I tried.

    IMPORTANT: If you haven't updated your ComfyUI installation after about March 25, 2026, you will need to run an update to get the new ComfyUI native node for ID-LoRA.

    Please see the ID-LoRA github page for important guidance on prompt formatting and usage. The default values in this workflow have worked well for me, but all the nodes are clearly exposed and labeled so you can tweak and experiment to get your own favorite results.

    Github: https://github.com/ID-LoRA/ID-LoRA

    One more piece of advice for generating audio tracks with LTX 2.3: The length of the generated audio clip is very important, probably more important than any other setting in the workflow. If the time is too short, LTX will rush through some of the script with almost no pause in between sentences and the result doesn't sound as natural as LTX is capable of doing. If the time is too long, LTX will stretch out pauses and sometimes repeat sections of the script. My recommendation is to say your script out loud to yourself, in a normal conversation pace, and time yourself doing it. Use that time as the duration for your clip and then adjust it longer or shorter as needed.

    Finally, don't be afraid to generate multiple audio clips if you have a long script with several breaks or pauses and LTX can't seem to get the pauses right. It's much easier to combine audio files into one track than it is to combine video and there are lots of online tools to help with that. When you assemble your own final audio track, you can insert pauses as long or short as you want.

    Description

    Bugfix version: correcting a mistake in the workflow where the cloned voice model was not connected to the following nodes as it should have been.

    FAQ

    Comments (7)

    katdarnell98993Apr 6, 2026
    CivitAI

    I've followed all the instructions and the final output voices are not cloned from the reference audio. I'm not sure what else I need to do or tweak. Any suggestions?

    darkroast175696
    Author
    Apr 6, 2026

    Are you using the 1.1 version that I posted last night? If you are, and you still don't get the cloned voice, you may be experiencing the bug in comfyui's new cloning node that is discussed in the github issue thread I mentioned in the model description below. At least one other person in that thread said he was able to get the node to show up in his workflow, but that the cloning didn't work. If that's the case with you, all I can say is to wait for a new update of comfyui and see if that fixes it.

    katdarnell98993Apr 8, 2026

    @darkroast175696 I have tried the fix on github and the node is now showing. Still no cloning though :(

    darkroast175696
    Author
    Apr 8, 2026

    @katdarnell98993 and you're using version 1.1 of the workflow, right? Because version 1.0 had a bug.

    katdarnell98993Apr 9, 2026

    @darkroast175696 Yes, 1.1. It generates and processes everything all the way to completion, but there is no similarities to the input voice unfortunately.

    darkroast175696
    Author
    Apr 9, 2026

    @katdarnell98993 I double-checked my version 1.1 to make sure I uploaded the correct workflow file and it's correct. There is another comment thread on this page started by "InsidiousOne" where someone posted a link to a discussion about the bug and someone had fixed their installation by replacing more files than just the one python script. You could read the comment there and see if that process helps in your case.

    darkroast175696
    Author
    Apr 15, 2026

    The github thread I mentioned earlier has a new message indicating the bug may now be fixed, in case anyone wants to give it a try. You'll need to update your comfyui to the newest version to test it out.
    https://github.com/Comfy-Org/ComfyUI/issues/13194#issuecomment-4249039663

    Workflows
    LTXV 2.3

    Details

    Downloads
    624
    Platform
    CivitAI
    Platform Status
    Available
    Created
    4/6/2026
    Updated
    5/27/2026
    Deleted
    -

    Files

    textToSpeechWithVoice_v11.zip

    Mirrors

    HuggingFace (1 mirrors)