CivArchive
    LTX 2.3 - I2V T2V Video Reasoning lora VBVR - v1.0 T2V
    NSFW

    This was the first VBVR lora trained for LTX 2.3

    Commercial use is now allowed! Just please credit me :3

    V4 has been trained on Sulphur 2 Base and was trained on 7000 videos with rank increased to 128 to give it more capacity to be able to improve prompt following. Using the distilled 1.1 lora at a strength of 0.5 in wan2gp you can crank steps all the way to 50 which can improve quality of gens. Not required though like it was not required in V3 or below but a nice option to maximize quality if you want. I usually do 10-12 steps. Videos are generated using 10eros since its the version of Sulphur optimized for i2v.

    Tip appreciated if you enjoy the work, these take a lot of time to create! https://ko-fi.com/misticrain69

    Wall of text with a bunch of info below

    V3 has been optimized for motion. Expect noticeably livelier, more dynamic movement compared to previous versions. I generated the example videos using the int8/fp8 distilled model using 8 steps in wan2gp. For on site generation the dev model has been reported to be broken, if using on site gen try the distilled one instead! I can highly recommend LTX 2.3 Sulphur 2 base if you are using comfyui and if your using wan2gp use the Sulphur 2 base rank 768 lora https://huggingface.co/SulphurAI/Sulphur-2-base/tree/main

    What changed in V3: Attention-only layers. The feedforward layers have been stripped, leaving only the attention weights. It seems like the prompt following and reasoning behavior most likely live in the attention layers, while the feedforward layers were potentially interfering with natural motion, likely by over-learning features like textures and style from the training data.

    Better motion, smaller file size.

    A LoRA that improves prompt following, temporal consistency, and motion "precision" for LTX 2.3. Reduces the floaty, drifty motion that LTX tends to add to scenes. Things that should move, move with purpose. Things that shouldn't move, move less. Also works on non-NSFW, non-Furry, realistic, animated etc. It responds well to detailed prompts.

    In comfyui or wan2gp lowering image strength to 0.85 can improve motion in general if you want more motion

    Feedback and A-B comparisons welcome. V2 and V1 was trained on 4800 videos.

    Prompting tips in non-nsfw terms so its less confusing just adapt it to nsfw:

    Be specific and literal. Describe what happens, in what order, step by step.

    Instead of "a ball bouncing around" → "A red ball moves to the right, bounces off the wall, and returns to the center"

    Instead of "fluid pouring" → "Water flows from the left container through the connecting tube into the right container until both levels are equal"

    Describe the starting state, the action, and the end state

    The LoRA follows prompts more literally than base LTX — precise prompts will give much better results

    How was it made?

    v0.1 and v0.2 were trained on 360 videos from the VBVR (Very Big Video Reasoning) dataset synthetic task videos where every motion is precise and intentional. No concept bleed, no style change, just tighter control.

    Based on the paper "A Very Big Video Reasoning Suite" which demonstrated this approach on Wan 2.2. I noticed that lora helped prompt following and temporal consistency a ton with wan so I am training this version for LTX.

    What does it actually do?

    Prompt following is more faithful — the model does more of what you asked instead of improvising

    Motion is more deliberate and less erratic

    Reduces random drift and wobble in scenes

    Temporal consistency improved — actions follow logical sequences

    What it doesn't do:

    Doesn't change visual style

    Doesn't add or remove capabilities LTX doesn't already have

    Not a motion LoRA — stacks with motion LoRA's

    Training details for v0.1 and v0.2 (if you give a shit)

    Rank 32

    360 VBVR synthetic videos at 512x512, 81 frames <------Alot less than 1 million but still a shitload to train on this is very slow to train locally.

    LR 1e-4, adamw8bit

    Training details for V1

    Training videos were increased to 4800

    Resolution is the same but frames were increased to 121

    Every other setting the same as v0.1 and v0.2

    More training data from the VBVR dataset was added to v1

    Below is the new dataset I trained on's data composition if your curious

    Tier 1 — Physics and Motion (3,400 samples)

    Core generators at 300 each: G-11 (object reappearance) has a shape move off-screen in a direction and return along the same path — teaches trajectory and object persistence. G-25 (separate object spinning) is a shape that rotates in place then translates horizontally to a target position — multi-step motion sequencing. G-33 (visual jenga) is a stack of objects that get removed one by one from top to bottom — sequential extraction with implicit physics ordering. O-29 (ballcolor) is ball tracking tasks with color — motion following plus identity preservation. O-52 (traffic light) is discrete state transitions, lights switching on/off between green and gray — teaches the model that state changes are crisp, not gradual. O-75 (communicating vessels) is fluid equalizing between connected tubes based on pressure — continuous physics simulation over time. O-87 (fluid diffusion) is ink spreading in water — another continuous physical transformation but with expansion rather than equalization.

    New additions at 250 each: G-35 (hit target after bounce) is a ball with an initial direction that bounces off walls following reflection laws to hit a target — pure trajectory prediction with physics constraints. O-30 (bookshelf) is book rearrangement on shelves — the specific task VBVR highlighted where their model beat Sora 2.

    Multi-step transforms at 160 each: O-7 (shape color change) is a single transformation — shape changes from one color to another. O-8 (shape rotation) is a shape rotating by a specific angle. O-13 (outline then move) is two sequential steps: change a shape's outline style, then move it to a new position. O-14 (scale then outline) is also two steps: scale a shape up or down, then change its outline. These four together teach the model that instructions are ordered and each step completes before the next begins.

    Tier 2 — Spatial and Reasoning (1,420 samples)

    Proven generators at 100 each: G-13 (grid number sequence) is filling in number patterns on a grid. G-17 (grid avoid red block) is pathfinding on a grid while avoiding obstacles. G-31 (directed graph navigation) is finding the shortest path through a directed graph. G-41 (grid highest cost) is evaluating spatial values on a grid to find the optimal path. O-24 (domino chain) is a sequential cascade where dominoes fall until they hit a gap — teaches causal chains and stopping conditions. O-34 (dot to dot) is connecting numbered dots in sequence — ordered drawing. O-47 (sliding puzzle) is tile rearrangement under constraints, like a 15-puzzle. O-83 (planar warp) is warping a grid to align with a target quadrilateral — geometric transformation.

    New reasoning diversity at 130 each: O-1 (color mixing) is RGB additive mixing where two light sources combine and the result fills a target zone — rule-based continuous process. O-33 (counting objects) is exactly what it sounds like — count things correctly. G-3 (stable sort) is arranging objects by a rule while preserving relative order. G-37 (symmetry random) is completing a pattern by mirroring across an axis. O-21 (construction blueprint) is fitting a correct puzzle piece into a gap in a structure. G-44 (BFS) is breadth-first search traversal of a graph — systematic layer-by-layer exploration.

    The overall dataset is weighted roughly 70/30 toward physical motion and transformation tasks over abstract spatial reasoning, All of these are taken from the VBVR dataset I am not the creator of the dataset. I'm pretty new to lora training so if you have tips let me know.

    REMEMBER its not X, its Y.

    Disclaimer & Terms of Use

    This model is provided "AS IS", without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose, and noninfringement.

    IN NO EVENT SHALL MisticRain69 BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL OR ANY OUTPUT GENERATED WITH IT.

    18+ only. You, the user, are solely responsible for anything you generate with this model and for ensuring your use complies with all applicable laws and the CivitAI Terms of Service.

    You may not:

    produce anything illegal, non-consensual, defamatory, or used for harassment, deception, or fraud;

    create any content depicting, or appearing to depict, minors (real or fictional) in a sexual context — no exceptions, ever;

    use this model to depict any real, identifiable person — celebrity, public figure, or private individual — in any context, SFW or NSFW. No likeness use, period.

    If you post outputs to the gallery that violate the CivitAI TOS, you'll be blocked and reported. Don't do this shit. Don't be fucking sus.

    Description

    Trained on 4800 VBVR videos

    FAQ

    Comments (13)

    makiaeveliMar 28, 2026
    CivitAI

    It increases understanding for sure, but I also see it's trained at a different FPS than the default, making it choppy at 24fps.

    huj0ps1t6Mar 29, 2026

    Haven't tested it yet but indeed OP is doing 512x512 81frames so he is likely using 5s 16FPS clips like we do for WAN but LTX wants 24FPS.

    EDIT: as a last resort if obtaining a proper 121 frames dataset is not possible - do interpolation to extend the frames. I think the LTX-2 branch of Musubi does interpolation on its own if you set:

    source_fps = 16.0

    target_fps = 24.0

    But I could be wrong. Also it seems target_fps defaults to 25.0 so maybe that's what it should be used? Not sure again but I've been using 24.0 and haven't had problems with smoothness so far.

    MisticRain69
    Author
    Mar 29, 2026

    @huj0ps1t6 The source videos are 16fps interpolated to 24fps to match LTX's default training fps. I haven't seen choppiness in my testing could you share an example? It might be a prompting issue since the LoRA makes LTX follow prompts more literally. V1 was trained on 4,800 videos at full 5-second length on RunPod

    huj0ps1t6Mar 29, 2026

    @MisticRain69 Give me an hour and I'll debug this - like I said i haven't tested yet and if you are already doing 24FPS then my suggestion is null. For what's worth I don't see a problem with any of your previews either - they look pretty smooth to me.

    huj0ps1t6Mar 29, 2026

    Took me a while but I was able to do some small testing. I don't feel like there is a noticeable hit in smoothness for I2V but there is some for T2V for sure. You didn't mention though I guess its heavily implied this LoRA was trained exclusively on I2V ?

    makiaeveliMar 29, 2026

    Yeah i was using T2V and lower weights around 0.7 improved it. Of course, can always interpolate the result, just a little faster to not :)

    MisticRain69
    Author
    Mar 30, 2026

    @makiaeveli I noticed its mainly t2v that it will look like its 16fps it doesn't happen much with i2v for me. I think the lora could benefit from training for another epoch but at this scale If I trained it locally it would take over a week to do 1 epoch and on runpod it is very expensive, which really sucks.

    SilandaMar 30, 2026
    CivitAI

    Hmm. While it possibly improves realism in some respects, it took me a little while to notice but I think I see a potential issue with the videos it's producing for me: the character's don't tend to blink much if at all.

    Prompting for blinking helps but that can also exaggerate it.

    ForeverNecessary737716Mar 30, 2026
    CivitAI

    what's the difference between versions?

    samsungsmartphonebb963Mar 30, 2026· 1 reaction
    CivitAI

    This LoRA appears to have been trained at 24fps. There is a drawback in that even when producing 32fps video, it looks like 24fps. However, this issue is resolved by entering the following on the very first line of the positive prompt:(And lower Lora's strength to 0.4.)

    smooth motion, 32fps quality, high framerate,

    horseflybite500Apr 1, 2026

    This helps with the framerate problem but it hampers the LORA's effectiveness for obvious reasons. It seems better to just not use it at this strength.

    horseflybite500Apr 1, 2026· 2 reactions
    CivitAI

    I've experimented with this a bit. In some cases it does give the prompt more authority. In one case LTX 2.3 just refused to have someone hold their arms over their head regardless of what prompt or seed I used, and putting this LORA at 1.0 completely fixed it and it also understood what content I was trying to achieve better. Unfortunately, it also introduces stuttering, even at 1.0. One of LTX's strengths is its smooth animation and this compromises it pretty heavily.

    I did find a way to mitigate it at 1.0 strength by only introducing it into the first stage of the generation, but not the second (and third in my case because I use 2 stages for the upscale). Since the upscaling portion is simply refining the existing content, it seems to have mostly done its job by then anyways.

    elevendrApr 2, 2026· 4 reactions
    CivitAI

    Incredible, I've been meaning to inject reasoning into LTX 2.3 to get as close to SeeDance 2.0 as close as possible. Thanks for this LoRA.