Starting from version 2.0, the workflow support txt2img, img2img, Inpaint functionality and uses the built-in LLM node
https://github.com/AlexYez/comfyui-timesaver
instead of the external Ollama program. TS_Qwen3_Node node can describe images, translate prompts and enhance prompts.
If your operating system is Windows and you can't install Qwen3_Node dependencies (don’t have a compiler installed), try to download the .whl file from
https://github.com/boneylizard/llama-cpp-python-cu128-gemma3/releases
then close ComfyUI, open the python_embeded folder, type cmd in the address bar, and execute the following command.
.\python.exe -I -m pip install "path to downloaded.whl file"after installing you can run ComfyUi and install missing custom nodes as normal way.
Edit: If .whl install fails, check your Python version and make sure that .whl was build for this version. If it is still fails, try to open .whl as archive and just extract all folders from archive to python_embeded\Lib\site-packages folder
===Old versions ===============================
This workflow combine power of LLM text models managed by Ollama with Flux image generation. It takes image or text as input, improve prompt or change it according to instructions.
Note: To refresh LLM models list you need to reload browser window by pressing F5 key.
Since 1.8 there is a blue switcher in Generate Image group to enable or disable context support.
Since 1.3 you need to switch blocks on and off and manually copy prompt text between blocks.
Information:
First of all you need to download and install Ollama from
In current workfow we use 2 LLM models:
Img2Img use llava for image tagging and Mistral for manipulations
Combined 1.3 use llava and phi4
Txt2Img 1.2 use only phi4
Txt2Img 1.1 use only Mistral
Before running Comfy you need to download models:
open command prompt from Ollama folder (with ollama.exe) and say
ollama pull llava:7b (if you have 8-12 Vram)
or
ollama pull llava:13b (for 16+ Vram)
and wait for model download and say For img2img and Txt2Img v.1.1
ollama pull mistral-small
For Txt2Img v.1.2 and combined 1.3 use
ollama pull phi4
After download finished start ollama app.exe, wait for tray icon, start Comfy and install missing custom nodes.
If not set, select llava in Ollama Vision node and mistral in Translate and Ollama Generate Advance nodes.
If you plan to give IMG2IMG instructions in other language turn on and use Translate node.
TXT2IMG take as prompt any language
====================
For Redux IP Tools version you need to download 2 models:
Clip Vision -> models\clip_vision
Style model -> models\style_models
Description
Ollama vision replaced with Miaoshouai_Tagger (it is currently better then any LLM model)
Few optimizations
FAQ
Comments (6)
My ComfyUI-Manager doesn't take "ApplyFBCacheOnModel"-Node in account, so it's missing. What part am I missing? Thank you.
It seems to run perfectly without this node.
It is a waveSpeed node for speeding up generation process by 20-30%
@TikFesku Thank you very much, all works perfect now!
Wasn't complicated to install, and works much better than I'd anticipated! But I'm curious about the Ollama Generate Advance in the Ollama Prompt Generator group; I can't find any explanations of what the parameters mean. For example I'd like to be able to shorten the prompt output like you usually can do with max characters, but I can't seem to figure it out. Anywho, great workflow, thanks a lot!
Unfortunately there is no way to limit output using parameters. Num_predict supposed to do this job, but it just cut normal output in the middle of phrase. But as it is LLM model, we can try to ask it to shorten reply using LLM instructions field or in control field of node itself. For example you can use as control phrase something like "Create English detailed image description using from 20 to 30 words without splitting into lists, any extra formatting, preamble, introduction, special characters and explanations.". Generally try not to use exact number (ask for range), remember that Ollama always add from 12 to 22 words to numbers you asked and all current models, if you ask to describe something, it can't output less then 30-40 words. So if you say "from 20 to 30", there must be from 32 to 52 words in result
@TikFesku I see, thanks for the explanation and again, great workflow!


