About v1.1
ControlNet integration within regions can be tricky because ControlNet maps the reference image to the entire output image space, not to the mask space. This means that if the reference image contains a character in the center, the character will also be generated in the center of the output image, regardless of the mask.
To address this, you can use one of the following approaches:
Switch
set_cond_areain the Regional Subgraph from the default setting to mask bounds (note: this may negatively affect interactions between regions),Or use the newly added "Pad image inside the mask" subgraph, which repositions and pads the reference image so it fits within the mask area.
Also was added "Person Mask" subgraph, which allows extraction of a person mask from the input image for more precise region mapping.
This subgraph uses a YOLO segmentation model, which should be placed in:
/models/ultralytics/segm(you can download model here)
Known Issues with the Workflow
Full mask coverage is required
Masks must collectively cover the entire image. The combined masks from all regions should span the full image area. If any part of the image is not covered by at least one mask, that area will not be processed correctly during regional sampling and may appear empty, flat, or incorrectly rendered. If it is not possible to cover the entire image with masks, you can merge all masks into one, invert it, and use the result as an additional (background) region.Gradient (soft) masks are not supported
The workflow does not work correctly with gradient (soft) masks. If a mask contains values below 100% (partial transparency), the sampler cannot properly denoise those areas. This results in visible noise, artifacts, or unstable textures. In some cases, you may reduce noise from non-binary masks by adding an additional KSampler with low CFG, step count, and denoise values—but this is not guaranteed to fully fix the issue.
The workflow in theory should work with any base model regardless of architecture (for now it has been tested on SDXL models, Chroma, Flux, and ZImage).
The workflow uses the recent ComfyUI subgraph feature, which allows extending the setup from 2 regions to any number of regions by simply duplicating the Regional Subgraph (and corresponding Regional FaceDetailer if needed).
Regional Subgraph structure
Each Regional Subgraph represents one independent region of the image and contains everything required to define a single character or element.
The inputs of the Regional Subgraph are:
region mask
Defines where this region is applied in the image. This can come from any mask generation method (rect masks, segmentation, depth, manual paint, etc.).regional positive conditioning
The prompt that describes the character or object inside this region.regional negative conditioning
Required, but can be either global negative prompt or region-specific.model (branched)
This is a key part of the workflow. Instead of using the base model directly, the model is first branched and modified (for example by applying LoRAs) before being passed into the Regional Subgraph.
Model branching and region-specific LoRAs
Each region has its own model branch. This is done by taking the base model and applying region-specific LoRAs before feeding it into the Regional Subgraph.
Typical setup:
Base model with global LoRAs → shared for the whole image and passed directly into Regional Sample
For each region:
duplicate the model path
apply one or more LoRAs specific to that region
pass the modified model into the Regional Subgraph model input
This allows each region to have completely different:
characters
styles
LoRAs
visual identity
Importantly, this avoids the common problem where LoRAs are applied globally and interfere with each other.
Extending the workflow
To add more regions:
Copy an existing Regional Subgraph
Connect:
a new mask
a new prompt
a new model branch (with its own LoRAs)
Add the output to:
Combine Conditioning
CombineRegionalPrompts
This makes the system fully scalable without restructuring the graph.
Customization
Everything except the following can be replaced depending on user preference or task:
Regional Subgraphs and Regional Sampler (structure must remain for modularity)
Combine Conditioning
CombineRegionalPrompts
Users are free to replace:
Mask generation (rect masks, segmentation, ControlNet, etc.)
Prompt encoding pipeline
LoRA setup
FaceDetailer or any post-processing
This allows adapting the workflow to:
character scenes
object composition
style mixing
inpainting pipelines
animation setups
Important note: Base steps vs Regional sampling
One of the most important parameters in this workflow is base_only_steps, which controls how the generation process is split between:
Base sampling stage
Regional sampling stage
How it works
The total number of steps is defined by the sampler (for example 30 or 50 steps).
These steps are divided into two phases:
Base sampling (base_only_steps)
Only the global prompt is applied
No regional prompts or masks are used yet
This stage defines:
composition
pose
camera angle
general structure of the image
Regional sampling (remaining steps)
Regional prompts and masks are applied
Each region modifies its assigned part of the image
This stage refines:
character identity
local details
LoRA-specific features
Why this matters
The base stage essentially “locks in” the structure of the image.
If too many steps are spent in base sampling:
the image becomes stable and coherent
but regions have less ability to change it
LoRAs may appear weak or not recognizable
If too few steps are spent in base sampling:
regions have strong influence
characters become more accurate
but overall composition may become unstable or inconsistent
Recommended default
A good starting point is:
base_only_steps = 50% of total stepsThis provides a balance between:
stable composition
effective regional control
When to increase base_only_steps
Increase base steps if:
composition is broken or inconsistent
characters are not aligned properly
perspective or layout is unstable
This gives the base stage more time to establish a solid structure.
When to decrease base_only_steps
Decrease base steps if:
region-specific LoRAs are weak
characters are not recognizable
regional prompts have little effect
This allows the regional stage to have more influence over the final image. However prefer increasing total steps first before decreasing base_only_steps
Practical intuition
Base stage = “what the image looks like”
Regional stage = “who/what is inside each part”
Balancing these two is key to getting both:
strong composition
strong character identity