Turns text into full songs with smart planning and diffusion power.
Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning — you still choose inputs, prompts, and settings.
Open preloaded workflow on RunComfy
Open preloaded workflow on RunComfy (browser)
Why RunComfy first
- Fewer missing-node surprises — run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout — useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON — the zip follows the same runnable workflow you can open on RunComfy.
When downloading for local ComfyUI makes sense — you want full control over models on disk, batch scripting, or offline runs.
How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.
Expectations — First run may pull large weights; cloud runs may require a free RunComfy account.
Overview
This workflow helps you turn text prompts or short inputs into full musical compositions quickly by using Ace Step 1.5. With a hybrid architecture integrating a language planner and diffusion transformer, it offers strong structure, melody clarity, and rhythmic precision. You can produce seamless multi-instrumental tracks or vocal layers with minimal setup. It supports LoRA fine-tuning for creating personalized tones and performance styles. Ideal for producers or AI musicians seeking creative control and faster results.
Important nodes:
Key nodes in Comfyui Ace Step 1.5 workflow
TextEncodeAceStepAudio1.5 (#94)
Transforms your creative brief and lyrics into conditioning that Ace Step 1.5 understands. For control, adjust language, musical key, and tempo to steer phrasing and harmony, and set section structure when you want more or fewer form changes. Use descriptive production notes like genre, mood, and mix cues to anchor style. Keep lyrics concise and metrical for cleaner vocal phrasing.
KSampler (#3)
Drives the diffusion process that turns planning into audio latents. Increase steps for more detail and stability, or reduce them for very fast previews. Try alternate sampler methods if you want different transient behavior, then keep the seed fixed to make comparisons fair. Raise guidance strength for tighter adherence to your Ace Step 1.5 prompt, lower it for freer improvisation.
EmptyAceStep1.5LatentAudio (#98)
Allocates the target song length as a latent tensor so every downstream stage works on the same duration. Set this to the number of seconds you want in the final render. Longer latents require more compute and may benefit from slightly higher quality settings in the sampler.
ModelSamplingAuraFlow (#78)
Attaches an Ace Step 1.5 compatible sampling strategy that balances speed and musical coherence. Use it when you want responsive iterations that still keep global structure intact. If you experiment with different sampler families, use the same seed to evaluate how timing and transients change.
SaveAudioMP3 (#104)
Exports the decoded waveform to a compressed file. Select bitrate to trade off size and fidelity for your release or sharing destination. For archival or mixing, you can swap this for a WAV save node in the same position.
ConditioningZeroOut (#47)
Provides a neutral negative conditioning, which is a safe default for lyrics‑driven music generation. Replace it with a custom negative prompt if you need explicit exclusions such as no vocals or fewer high‑frequency artifacts. Keep positive and negative instructions conceptually distinct to avoid conflicts.
Notes
Ace Step 1.5 in ComfyUI Workflow | Text-to-Music Diffusion — see RunComfy page for the latest node requirements.
Description
Initial release — Ace-Step-1.5.