For each model, see the version details for more information.
All the models here are experimental; they often won't work perfectly, the prompt will often need improvement, the input will often need alteration, or the parameters will need adjusting. Therefore, don't treat these models as something that should work perfectly right away for everything. They will only cease to be experimental when we discover a better way to do this.
Description
This is not a direct continuation of v1.0. It's a fresh training run with a different recipe, done in two stages:
- Stage 1 — pretraining on image edit pairs. Training a video model on stills is admittedly not ideal, but it was a way to expand the editing vocabulary well beyond what a small video-only dataset could teach (broader Add / Remove / Replace / Change coverage than v1.0 had).
- Stage 2 — video fine-tune with first-frame conditioning enabled. This brought the temporal prior back.
In theory v0.1 can do everything v1.0 did. In practice, temporal consistency may be weaker on some edits because most of stage 1 happened on still images. Whether it actually beats v1.0 on a given task is something we genuinely don't know yet.
This is still experimental