Aozora-XL Vpred
Introduction
Aozora-XL is an experimental fine-tune of NoobAI-XL (NAI-XL) V-Pred 1.0. This v0.1 alpha release serves as a proof-of-concept for a custom training method designed to improve the stability and predictability of v-prediction models.
The goal is to create a model that is easier to control and less prone to artifacts. As an alpha, this version is a first step, with future releases planned to be trained for longer on more extensive datasets.
Training on a Shoestring: The Philosophy Behind Aozora-XL
This project began with a simple, yet daunting, question: Can true, full-model or almost full SDXL fine-tuning be achieved on a consumer-grade 12GB GPU? The answer, after a year of personal research, development, and countless hours of trial and error, is a resounding "Kinda" you can fit at least 90% of the model without txt encoding at 80% with. This model is the first proof-of-concept from that journey.
This achievement is built on a philosophy of prioritizing accessibility and depth over raw speed. Here's what that means in practice:
Deeper Integration Than LoRA or Dreambooth
By fine-tuning a significant portion of the model's core architecture, the result is not just a style applied on top, but a deeper, more inherent understanding of the new data. This allows for greater coherence and flexibility than many alternative methods. The cost of this depth is, of course, training time. Speed was a deliberate sacrifice made to push the boundaries of what's possible on accessible hardware.A Tool for the Determined
The specialized script that made this possible will be released on my GitHub for the community. Be advised: this is not a one-click tool. It is a highly-tuned piece of code born from the specific constraints of my own NVIDIA RTX 3060. Adapting it to your own system will require a solid understanding of Python, PyTorch, and the patience to debug and experiment. It is a starting point for those who share the same passion for pushing limits.A Year-Long Labor of Love
This project represents more than just code; it's the result of a year-long personal obsession. It's countless late nights, frustrating setbacks, and the eventual triumph of making something work against the odds. The hope is that this work inspires other creators and proves that with enough dedication, the barrier to entry for advanced AI development can be lowered for everyone.
What Makes It Different From other Vpred models?
This model was trained with a bespoke method that uses several advanced techniques to ensure stable results and remarkable efficiency:
Custom Adaptive Min-SNR Gamma Weighting: Stabilizes the training process specifically for v-prediction, resulting in a more robust and reliable output.
Selective Layer Training: Focuses training on the most impactful layers (~80%) of the UNet and Text Encoders, preserving base model integrity while allowing for efficient learning.
Custom Learning Rate Curves: Tailored schedulers ensure optimal convergence without destabilizing the model by slowly raising the learning rate over time in a fixed negative step wave curve.
Recommended Usage
Positive Prompt: very awa, masterpiece, best quality,
Negative Prompt: Trained to not require a negative prompt, but you can use (worst quality, low quality) if needed.
Sampler: DPM++ 3M SDE GPU or Euler
Steps: 25-35
CFG Scale: 3-5 (The model is designed to work well at low CFG).
Resolution: Recommended 1024x1024, or similar standard XL sizes (e.g., 832x1216). Was trained up to 1152x1152.
HighresFix is recommended: Upscaler RealESRGEN with denoise of 0.35
Current pros and cons (v0.1 Alpha)
Con: Color accuracy has been lost from the base model due to the dataset
Con: The model was trained in images with no backgrounds so it assumes its the norm, please tag for backgrounds if wanted
Con: Tends to whitewash characters
Con: Doesn't understand occlusion most of the time
Con: Doesn't understand depth
Con: Very tag specific
Pro: Stable for a vpred
Pro: Hands and feet are mostly fixed besides some outliers
Pro: Scene composition is fairly unique per seed
Pro: Due to the learning rate and curve the model isnt too overfit and performs similar to base
Pro: Trained up to the latest zzz characters as 50% of the dataset
Pro: Handles multiple characters in the same shot
Pro: Trained on ?????? with tags like (????? lips, ????????, urethra, vaginal opening)
Training Details (v0.1 Alpha)
Base Model: NoobAI-XL (NAI-XL) V-Pred 1.0-Version
Dataset: A small, curated set of ~18,500 images focusing on modern anime styles and nudity.
Batch Size: 1
Gradient Accumulation Steps: ??
Total Epochs: 10
Hardware: 1x NVIDIA RTX 3060 (12GB)
VRAM Usage During Training: ~11.8 GB
Training Time: I lost track, it was a long time (weeks).
Optimizer: Adafactor
Learning Rate (Unet): 3e-6
Learning Rate (Text Encoder): 1.5e-6
Unet params trained: 2204.60M
Text encoder params trained: 101.19M
Future Plans & Disclaimer
This is an alpha release. Quality and character knowledge may vary. Future versions will involve longer training times and expanded datasets, which may lead to significant improvements and changes in style. Your feedback on this proof-of-concept is greatly appreciated!
Please note that V0.1 may not differ much from its foundations as it was only trained for a short time, seed differences may be around 20% instead of full range like other tunes, the v1 of the model will be trained for longer and will show significant adaptations
License
This model inherits the license from its base model, NoobAI-XL. Users are responsible for reviewing and complying with the terms of the base model's license.
Description
Initial alpha release as a Proof of concept
FAQ
Details
Files
Available On (2 platforms)
Same model published on other platforms. May have additional downloads or version variants.









