MMAudio KISS Workflows - CivArchive (CivitAI Archive)

What does this workflow do?

This is a light weight & fast ComfyUI workflow created for the sole purpose of creating audio for video using the NSFW Fine-Tuned Model "mmaudio". It was made with a focus of speed & minimizing VRAM usage while still keeping the audio synchronized.

What makes this workflow different?

MMAudio relies on video frames to understand context, but higher resolutions normally slow everything down since every pixel must be processed. This workflow solves that by using a down-sampling node to provide the same level of context to the audio model, while providing a ~3x speed increase, while leaving video quality unaffected!

See for yourself:

Why use this?

If you want fast, efficient audio generation; especially on lower-end hardware like mine then this workflow is for you!

Keep in mind these workflows are nothing more than for synthesizing audio, this is not for upscaling or generating videos. The videos must be pre-generated before you use this workflow.

NOTICE FOR OLLAMA:

Any of my automatic video caption workflows will require Ollama: a tool that can run LLM's that translate action into SFX. To use it you must download the program itself, which you can find here: https://ollama.com/download. The only Ollama model I use is qwen2.5vl:7b-instruct.

There's a well written guide here that goes over Olamma installation. https://civarchive.com/articles/6571/ollama-llama-31-install-guide-use-llama31-locally-and-in-comfyui-for-free

Or you can simply run the command: ollama pull qwen2.5:7b-instruct in Command Prompt or Powershell to install the model.

NOTES:

ALL RELEVANT MODELS WILL BE IN A NOTE IN THE WORKFLOW

I leave most of the nodes at full size so that beginners can read and understand what they do and how they may work. The recommended version of these workflows use the custom RTX Video Resolution Node, which may require a custom installation as well as Installing NVIDIA VFX Python packages. You'll find the links for both of them here, as well as in a note on the workflow itself. If you do not have an Nvidia graphics card, I have an alternate workflow that uses the built in Up-scaler from ComfyUI.

https://github.com/Comfy-Org/Nvidia_RTX_Nodes_ComfyUI

https://pypi.nvidia.com -> Dependencies for Nvidia's custom node. To install, navigate to wherever your ComfyUI python embeded folder is and open command prompt or windows powershell and run the command: pip install -U --no-build-isolation nvidia-vfx --index-url https://pypi.nvidia.com

To verify installation, run the command: pip show nvidia-vfx --if it returns a version model then you have done everything correct!

MY HARDWARE:

RTX 5060 TI 8GB Vram

16GB RAM

Intel Core i7-9700K CPU 3.60 GHz

Manual Prompting: Avg Audio Gen Time -> ~30 seconds.

Auto SFX Prompting: Avg Audio Gen Time -> ~95 seconds.

"Keep it simple, stupid"

Use responsibly, and let me know if you have any feedback/ideas/complaints! I love to learn and would like the opportunity to refine whatever I make! :)

What does this workflow do?

What makes this workflow different?

Why use this?

NOTICE FOR OLLAMA:

NOTES:

ALL RELEVANT MODELS WILL BE IN A NOTE IN THE WORKFLOW

MY HARDWARE:

"Keep it simple, stupid"

Description

FAQ

Details

Files

mmaudioKISSWorkflows_v25SFXAutocaption.zip

Mirrors

What does this workflow do?

What makes this workflow different?

Why use this?

NOTICE FOR OLLAMA:

NOTES:

ALL RELEVANT MODELS WILL BE IN A NOTE IN THE WORKFLOW

MY HARDWARE:

"Keep it simple, stupid"

Description

FAQ

What is MMAudio KISS Workflows?

What files are available and where can I download them?

Details

Files

mmaudioKISSWorkflows_v25SFXAutocaption.zip

Mirrors