Reference Video
🥯 BAGEL Workflow for ComfyUI – All-in-One Image Generation, Editing & Visual Reasoning
This is a complete ComfyUI workflow powered by BAGEL (Blip-Aware Generator Enhanced with Logic), combining text-to-image, image editing (inpainting), and visual question answering (VQA) using BLIP2 and Vicuna. Ideal for advanced AI creators who want generation + reasoning in one streamlined pipeline.
🚀 Key Features:
📷 Text-to-Image Generation with language-aware detail
🛠️ Image Editing & Inpainting with precise control
💬 Visual Question Answering (VQA) via BLIP2 + Vicuna 7B/13B
🔄 Pre-built and optimized ComfyUI workflow — no manual setup needed
🔧 VRAM & Hardware Requirements:
❗ Minimum VRAM: 16GB (BLIP2 + Vicuna are memory-intensive)
💻 Recommended: 24GB+ (e.g., RTX 3090/4090 or A6000) for stable performance
⚠️ Not suitable for low-VRAM systems — Vicuna models are large and require significant resources
🧠 Optionally runs better with exllama or exllamav2 loaders if using quantized models