Qwen-Image is a powerful multimodal model developed by Alibaba Cloud, designed to understand and generate both text and images. It supports high-resolution image generation and complex visual reasoning, making it suitable for a wide range of AI applications. As part of the Qwen family, Qwen-Image integrates seamlessly with other models to enhance creative and analytical tasks across modalities.
Description
FAQ
Comments (31)
Hopefully, the mods at CivitAI will give QWEN a tag for it soon, so we can filter it as a base model instead of searching it in the "other" models category.
It's supported now 👍🏻
Hi do you recommend this for RTX 4070? how many minutes will it take lets say for 896x1152?
my settings is RTX 3060(12GB Vram), Resolution: 896x1328, CFG: 1.5, Steps:14 , Sampler: res_multistep. takes about 5 minute each image. (first run is longer)
aigirlfriend555 That's good i'll try it out. Thanks for valuable info
aigirlfriend555 弱弱的问一句 我30708g+16g共享,能用么
What settings? Draws a black screen.
You have to launch ComfyUI without the "--use-sage-attention" flag in the .bat file.
and without the "--fast" flag, if it is not yet fixed.
cluster1500 thx dude
Base Model type "Qwen" is now supported by CivitAi. Please change it from "Others", so we can filter properly for all the amazing Qwen art 🥰
If use 4090, Q6_K,Q8_0 and Fp8, which is better and fast?
I believe it would be Q6 for speed, FP8 for quality, and Q8 for something in between.
For a 4090, fp8 will always be better than gguf. The gguf models are for people who have low VRAM since the model is loaded on the cpu and not the gpu. But that causes the generation of the image to be much slower since the CPU is not as good for that type of calculations. On the other hand, the .safetensors models (fp8, fp16) are loaded on the GPU.
@creatumundo399 don't tell what you don't know. All gguf can be loaded in gpu too
@zids The gguf model was created to be able to run models on PCs with less VRAM. Yes, okay, you can load it onto the GPU. But why would you download a gguf model when you have a 4090 when you can use fp8 or fp16 and even fp32, which work directly on the graphics card, faster and more efficiently? I may not know everything, but I see that I know better than you how to use the models.
fp8 will run blazing fast.
Not sure about Qwen, but in Flux Q8_0 model definitely better than fp8. But it slower than fp8
@ViktorIltimirov Yeah coz he is wrong on every metric, fp8's lose a LOT of quality in comparison with original/base model. Whereas a quant at the top of the shop @ Q8 , in the vast majority of cases are very close if not indistinguishable from OG base models. BUT yeah slower than FP8 of course.
How do you fix that qwen ALWAYS generates almost the same image with promt, on the different seeds?
Yeah, that's its huge con: the model seems to be trained for precision and it avoids creativity.
But since the model seems to be precise, in theory, the variety of keywords in the prompt compensate for the lack of variety from the seed change.
But then again, with some prompts it seems to lack training so it may perhaps be a very limited model.
I think the best way to get around this is to use some loras.
Wildcards.
What text encoder is this Checkpoint looking for? I waited for nearly 33 minutes for this to generate and it was only at 25%. I'm using an RTX 5070 so I know it shouldn't take that long and loading the checkpoint and removing the vae node were the only changes i made to the Qwen workflow.
I wonder if you're using bf16? I can only say that fp8 variant requires qwen_2.5_vl_7b_fp8_scaled With 16GB 4060 Ti it takes up to 7 minutes to generate an image. Your GPU should be about 15% faster.
@lyagushka420 5070 only have 12GB VRAM and Qwen-image fp8 need 19GB space (and around 2GB Clips). so it'll have a much worse experience compare to 4060ti with weaker GPU but much bigger VRAM.
I used GGUF-Q4KS and takes around 180 seconds to generate a picture with similar texture but much worse chinese text comprehension.
@lyagushka420 @uncinsane475 turned out to be user error. I was trying to use it as a Checkpoint but it should have gone in diffusion_model. I ended up going back to the OG fp8 version and I'm able to run 1328x1328 at 50 steps in about 5-7 minutes.
fp8 model can run with 12Gb VRAM. Image can generate less than one minute on 3080 RTX when it combined with lightning 8 step lora and cfg 1
is it only for Comfy UI and can it run on oldschool A1111?
"Old"
How can I get output with no pubic hair. All the time I type the prompt,then give me the pubic hair....is any prompt or lora can make it out?
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.


