Recently updated 26/03/2026 - full female body pre-set and Instagram video type pre-set
these focus on everything from clothing, to movement, ethnicity, breast size, actions. ect. (Instagram closer to clothed then naked)
The idea is simple - Caption a video so well then you can Give that same caption to LTX2.3 to recreate the video - Surely that makes the best lora?
If you find this tool really useful only IF IT WORKS PERFECTLY FOR YOU Consider - Buying-me-a-coffee <3
it goes a long way into fuelling my efforts
Install - Empty zip into a folder, - install bat, start bat, Model will download first run (load model)
- small update added images to it not just video





By Default all caption tools only scan 1 frame of a video first/middle or last frame
This can scan up to 10 frames (a lot of v-ram and slower) equally spaced apart AFTER the video is segmented into the desired length, not before so its accurate to that exact clip.
Caption from a video, (2 separate frames of the 5 second video)

Description
FAQ
Comments (8)
This is what's already in your easy Caption workflow, right?
Is this just for training yourown lora or does this force the text encoder to output exactly what you typed?
This work like a charm, very useful, thank you Daddy !
Location of the models?
And is it possible to create a custom location?
Thanks for the useful tool, one issue though- It does not unload model from memory unless its closed completely. unload model button has no effect.
I think something is wrong.
As the input file, I have one MP4 file that is 1 minute long.
I set segment length to 5 seconds,
resize to 768,
and max clip to unlimited.
Instead of splitting the video into 5-second segments, the program cuts it into:
55s,
50s,
45s,
40s,
and so on…
so the last clip is only 5 seconds.
I have an RTX 4090, 96GB RAM, and Windows 11.
Works great!
Awesome job on this!
Here's a tip if you're running two gpus and this starts up on the wrong one, just edit start.bat and add in set CUDA_VISIBLE_DEVICES=0 [replace 0 with the number of the board you want to use]