What does this workflow do?
This is a light weight & fast ComfyUI workflow created for the sole purpose of creating audio for video using the NSFW Fine-Tuned Model "mmaudio". It was made with a focus of speed & minimizing VRAM usage while still keeping the audio synchronized.
What makes this workflow different?
MMAudio relies on video frames to understand context, but higher resolutions normally slow everything down since every pixel must be processed. This workflow solves that by using a down-sampling node to provide the same level of context to the audio model, while providing a ~3x speed increase, while leaving video quality unaffected!
See for yourself:

Why use this?
If you want fast, efficient audio generation; especially on lower-end hardware like mine then this workflow is for you!
Keep in mind these workflows are nothing more than for synthesizing audio, this is not for upscaling or generating videos. The videos must be pre-generated before you use this workflow.
NOTICE FOR OLLAMA:
Any of my automatic video caption workflows will require Ollama: a tool that can run LLM's that translate action into SFX. To use it you must download the program itself, which you can find here: https://ollama.com/download. The only Ollama model I use is qwen2.5vl:7b-instruct.
There's a well written guide here that goes over Olamma installation. https://civarchive.com/articles/6571/ollama-llama-31-install-guide-use-llama31-locally-and-in-comfyui-for-free
Or you can simply run the command: ollama pull qwen2.5:7b-instruct in Command Prompt or Powershell to install the model.
NOTES:
ALL RELEVANT MODELS WILL BE IN A NOTE IN THE WORKFLOW
I leave most of the nodes at full size so that beginners can read and understand what they do and how they may work. The recommended version of these workflows use the custom RTX Video Resolution Node, which may require a custom installation as well as Installing NVIDIA VFX Python packages. You'll find the links for both of them here, as well as in a note on the workflow itself. If you do not have an Nvidia graphics card, I have an alternate workflow that uses the built in Up-scaler from ComfyUI.
https://github.com/Comfy-Org/Nvidia_RTX_Nodes_ComfyUI
https://pypi.nvidia.com -> Dependencies for Nvidia's custom node. To install, navigate to wherever your ComfyUI python embeded folder is and open command prompt or windows powershell and run the command: pip install -U --no-build-isolation nvidia-vfx --index-url https://pypi.nvidia.com
To verify installation, run the command: pip show nvidia-vfx --if it returns a version model then you have done everything correct!
MY HARDWARE:
RTX 5060 TI 8GB Vram
16GB RAM
Intel Core i7-9700K CPU 3.60 GHz
Manual Prompting: Avg Audio Gen Time -> ~30 seconds.
Auto SFX Prompting: Avg Audio Gen Time -> ~95 seconds.
"Keep it simple, stupid"
Use responsibly, and let me know if you have any feedback/ideas/complaints! I love to learn and would like the opportunity to refine whatever I make! :)
Description
Adjusted context for Ollama node for better prompt adherence.
Added better documentation and a download option for the SFW MMAudio model.
Removed Nvidia Super Res as It's not necessary for good speed
Added better documentation for the Ollama model used.