CivArchive
    ACE-Step 1.5 Psytrance LoRA Ver 2.0 - v2.0
    NSFW
    Preview 130730696

    ## Technical Guide: Training ACE-Step 1.5 LoRA for Psytrance on an RTX 3090## 1.

    Dataset Preprocessing & Surgical Slicing

    Standard audio slicing methods destroy the rhythm and phase alignment of dense audio

    material like Psytrance. Follow these strict preprocessing rules:

    * Zero-Crossing Slicing: Cuts must occur exactly at zero-crossing points ($0\text{ dB}$

    amplitude) to avoid digital clicks, pops, and phase cancellation.

    * Zero Fade/Crossfade Rule: Never use standard fade-ins or fade-outs. The diffusion model

    will interpret this as a musical instruction and learn to fade the track out every 30 seconds.

    * 1-Second Crossfade Overlap: Implement a 1-second crossfade overlap between chunks to

    maintain continuity across sample boundaries.

    * Fixed Chunk Length: Slice the source material into exact 30.0-second segments. This

    captures complete musical phrases while fitting comfortably into the 24 GB VRAM limit.

    * Format Constraints: Export all slices at 44.1 kHz, 16-bit or 24-bit PCM WAV. Avoid MP3

    compression to prevent codec artifacts from muddying the high frequencies.

    ## 2. Text-Caption Tagging Strategy

    Audio diffusion models require metadata to isolate tempo and key. Each audio slice requires

    a matching .txt file with identical naming.

    * BPM & Key Isolation: Explicitly tag the precise BPM and musical key (e.g., 142 bpm, G#

    minor). This prevents the model from blending different tempos and scales into a dissonant

    mix.

    * Sub-Genre Descriptor: Start every caption with a unified anchor tag (e.g., psytrance track).

    * Structural Elements: Document specific sonic elements present in that chunk (e.g., rolling

    triplet bassline, punchy energetic kickdrum, sharp acid synth leads, rhythmic percussion,

    crisp hi-hats).

    * Quality Tokens: Append production quality tags at the end of the text file (e.g., studio

    master quality, clean professional mix).

    ## 3. Training Hyperparameters & VRAM Optimization (RTX 3090)

    To maximize the 24 GB VRAM of an RTX 3090 without triggering CUDA out of memory

    errors, use these exact network dimensions and pipeline settings:

    ## Network Architecture (LoRA)

    * LoRA Rank ($r$): 64 (Provides sufficient capacity to map distinct keys and tempos into

    separate internal slots).

    * LoRA Alpha: 32 (Ensures stable weight scaling).

    * LoRA Dropout: 0.05 (Prevents overfitting while retaining rapid pattern recognition).

    * Target Modules: ["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]

    ## Optimization & Precision

    * Mixed Precision: bf16 (Mandatory for modern GPU compute stability).

    * Optimizer: bitsandbytes 8-bit AdamW (Compresses the optimizer states to halve VRAM

    allocation).

    * Gradient Checkpointing: True (Recomputes activations during the backward pass to save

    massive amounts of VRAM).

    * Hardware Allocation: Set num_workers=4, pin_memory=True, and

    persistent_workers=True.

    ## Training Schedule

    * Batch Configuration: Set train_batch_size: 2 and gradient_accumulation_steps: 2 (Creates

    an effective total batch size of 4, ensuring smooth gradient updates for complex audio

    signals).

    * Learning Rate: 0.00007 ($7\cdot10^{-5}$) with a cosine scheduler and 100 warmup steps.

    A lower learning rate preserves sharp transient structures like tight kick drums.

    * Seed: Set to -1 (Random Seed) across later epochs to shuffle data blocks and improve

    generalization.

    ## 4. Training Phases & Loss Graph Analysis

    The training graph demonstrates a mathematically ideal convergence curve for a dense

    audio dataset under a randomized training seed:

    Loss

    0.55 | \

    0.50 | \

    0.45 | \_________

    0.40 | \________ [Plateau / Saturated Fine-Tuning]

    0.35 |______________________

    +-----------------------

    3100 3300 3500 3700 Step

    * Phase 1 (Epoch 0 - 30): Macro-Structure Acquisition: The initial loss drops rapidly from

    $\sim0.60$ down to $\sim0.45$. The model identifies coarse structural features, including

    noise floors, fundamental frequencies, and the main percussive grid.

    * Phase 2 (Epoch 30 - 35): Mid-Frequency Stabilization: The curve forms a gentle slope

    between step 3100 and 3400. The random data seed (-1) introduces acoustic variety, forcing

    the optimizer to consolidate structural patterns across different BPM/Key signatures

    simultaneously.

    * Phase 3 (Step 3400 - 3800): Micro-Optimization & Transients: The Loss (smoothed) forms

    a textbook plateau between $0.36$ and $0.38$. The raw loss values variance narrows down

    significantly, occasionally hitting micro-troughs near $0.31$. This indicates that the model

    has fully saturated its learning capacity for the dataset and is purely refining micro-details

    like phase alignment and crisp transient sharpness. Pushing the model below $0.30$ is

    highly discouraged as it triggers immediate acoustic degradation (overfitting).

    ## 5. Inference & Audio Generation Configuration

    Once training concludes at Epoch 40, halt the script and configure the Inference tab using

    these precise generation parameters:

    * Inference Backend: Set to PyTorch (Do not use vLLM or Triton on native Windows

    environments due to library compatibility issues).

    * Base Model Path: Point to checkpoints/acestep-v15-xl-sft.

    * LoRA Model Path: Load the target checkpoint (e.g., epoch_35 or epoch_40).

    * LoRA Scale: 0.85 to 1.0 (Start at 0.85 to maintain flexibility; increase to 1.0 if the synthetic

    output lacks the driving weight of the original data).

    * Inference Steps: 50 (Provides clean diffusion generation without blurring the fast

    transients).

    * CFG Scale: 4.5 to 5.5 (Higher values force strict adherence to the prompt tags, lower

    values add acoustic variation).

    * Audio Length: Exact 30.0 seconds (Must match the training slice length; generating beyond

    this window causes structural collapse).

    * Target Generation Prompt: Feed the explicit tokens used during tagging to extract the

    clean, isolated style:

    A high-energy psychedelic trance track, 142 BPM, fast driving rolling bassline, punchy

    energetic kickdrum, sharp acid synth leads, rhythmic percussion, crisp hi-hats, studio master

    quality, clean professional mix

    Description

    pretrained_model_path: "checkpoints/acestep-v15-xl-sft"

    output_dir: "output/psytrance_lora_xl_sft"

    # --- DATASET ---

    dataset:

    path: "PSY_MASTER_DATASET"

    sample_rate: 44100

    slice_duration: 30.0

    # --- HYPERPARAMETER ---

    train_batch_size: 2

    gradient_accumulation_steps: 1

    epochs: 30

    learning_rate: 0.00007

    lr_scheduler: "cosine"

    lr_warmup_steps: 100

    # --- LORA PARAMS ---

    lora_rank: 64

    lora_alpha: 32

    target_modules: ["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]

    # --- OPTIMIERUNGEN ---

    mixed_precision: "bf16"

    gradient_checkpointing: true

    optimizer: "adamw_8bit"

    # --- SAVING ---

    save_every_epochs: 5

    checkpointing_steps: 4000

    This is a high-quality Psytrance Audio LoRA trained on ACE-Step 1.5 SFT-XL. It is optimized for generating punchy rolling triplet basslines, crisp percussion, and sharp acid synth leads.

    Training Details:

    - Base Model: ACE-Step 1.5 XL SFT

    - Dataset: 44.1 kHz Studio Masters (30-second seamless zero-crossing slices)

    - Hyperparameters: Rank 64, Alpha 32, Learning Rate 7e-5

    Recommended Inference Settings:

    - Inference Steps: 50

    - CFG Scale: 4.5 - 5.5

    - LoRA Scale: 0.85 - 1.0

    - Audio Length: Exactly 30.0 seconds per chunk

    ___________________________________________________________

    "caption": "psytrance style, , atmospheric pads,rolling bassline, acid squelch, metallic leads, high quality, ",

    "global_caption": "",

    "lyrics": "[Intro - Atmospheric Pads & Ambient Synth]\n\n",

    "instrumental": false,

    "vocal_language": "unknown",

    "bpm": 135,

    "keyscale": "",

    "timesignature": "4",

    "duration": 30,

    "enable_normalization": true,

    "normalization_db": -1,

    "fade_in_duration": 0.0,

    "fade_out_duration": 0.0,

    "latent_shift": 0,

    "latent_rescale": 1,

    "inference_steps": 120,

    "seed": 1297183202,

    "guidance_scale": 8.1,

    "use_adg": false,

    "cfg_interval_start": 0,

    "cfg_interval_end": 1,

    "shift": 3,

    "infer_method": "ode",

    "sampler_mode": "heun",

    "velocity_norm_threshold": 0,

    "velocity_ema_factor": 0,

    "dcw_enabled": true,

    "dcw_mode": "double",

    "dcw_scaler": 0.02,

    "dcw_high_scaler": 0.06,

    "dcw_wavelet": "haar",

    "timesteps": null,

    "repainting_start": 0,

    "repainting_end": -1,

    "chunk_mask_mode": "auto",

    "repaint_latent_crossfade_frames": 10,

    "repaint_wav_crossfade_sec": 0.0,

    "repaint_mode": "balanced",

    "repaint_strength": 0.5,

    "retake_seed": null,

    "retake_variance": 0.0,

    "flow_edit_morph": false,

    "flow_edit_source_caption": "",

    "flow_edit_source_lyrics": "",

    "flow_edit_n_min": 0.0,

    "flow_edit_n_max": 1.0,

    "flow_edit_n_avg": 1,

    "audio_cover_strength": 0.93,

    "cover_noise_strength": 0,

    "thinking": true,

    "lm_temperature": 0.85,

    "lm_cfg_scale": 2,

    "lm_top_k": 0,

    "lm_top_p": 0.9,

    "lm_negative_prompt": "bitcrushed, aliasing, quantizing noise, digital clipping, glitchy, stutter, stuttering, dropouts, artifacting, mp3 artifacts, 64kbps, encoded,jazz, funk, pop, acoustic, lo-fi, orchestral,house, techno, dubstep, pop, vocal hooks, acoustic instruments, electric guitar, slow tempo, jazz chords, ambient drone, lo-fi hiss, distorted drums, orchestral elements, trap hi-hats,major scale, happy chords, uplifting melody, bright pop progression, standard minor chord changes, blues scale",

    "use_cot_metas": false,

    "use_cot_caption": false,

    "use_cot_lyrics": false,

    "use_cot_language": false,

    "use_constrained_decoding": true,

    "cot_bpm": null,

    "cot_keyscale": "",

    "cot_timesignature": "",

    "cot_duration": null,

    "cot_vocal_language": "unknown",

    "cot_caption": "",

    "cot_lyrics": "",

    "lora_loaded": true,

    "use_lora": true,

    "lora_scale": 1.0,

    "lora_weights_hash": "0c942944d792d52643a33c2e54b236c2ad6a4e26af46503870abbedb26178490",

    "audio_format": "mp3",

    "mp3_bitrate": "320k",

    "mp3_sample_rate": 44100,

    "repaint_source_latents_file": "e5c0775d-18f4-0022-de7f-a5635c521ed2.repaint_latents.npy",

    "session_artifact_file": "e5c0775d-18f4-0022-de7f-a5635c521ed2.session.npz",

    "session_artifact_kind": "generation_intermediates_v1"

    }

    FAQ

    LORA
    ACE Audio

    Details

    Downloads
    163
    Platform
    CivitAI
    Platform Status
    Available
    Created
    5/15/2026
    Updated
    6/30/2026
    Deleted
    -

    Files

    PsyTrance_Ver2.0.safetensors

    Size:
    320.06 MB
    SHA256:

    Mirrors

    adapter_config.json

    Mirrors

    CivitAI (1 mirrors)