CivArchive
    [Qwen] Rebalance v1.0 - v1.0
    NSFW
    Preview 107108355
    Preview 107108375
    Preview 107108379
    Preview 107108383
    Preview 107108385
    Preview 107108374
    Preview 107108380
    Preview 107108382
    Preview 107108384
    Preview 107108381
    Preview 107108372
    Preview 107108378
    Preview 107108371
    Preview 107108373
    Preview 107108377
    Preview 107108376
    Preview 107108370
    Preview 107227831
    Preview 107227829
    Preview 107227830

    Example Workflow:

    https://civarchive.com/models/2065313/rebalance-v1-example-workflow

    Thanks for WANdalf helping to extract the loras for nunchaku usage.

    Model Overview

    Rebalance is a high-fidelity image generation model trained on a curated dataset comprising thousands of cosplay photographs and handpicked, high-quality real-world images. All training data was sourced exclusively from publicly accessible internet content, and the dataset explicitly excludes any NSFW material.

    The primary goal of Rebalance is to produce photorealistic outputs that overcome common AI artifacts—such as an oily, plastic, or overly flat appearance—delivering images with natural texture, depth, and visual authenticity.

    Training Strategy

    Training was conducted in multiple stages, broadly divided into two phases:

    1. Cosplay Photo Training
      Focused on refining facial expressions, pose dynamics, and overall human figure realism—particularly for female subjects.

    2. High-Quality Photograph Enhancement
      Aimed at elevating atmospheric depth, compositional balance, and aesthetic sophistication by leveraging professionally curated photographic references.

    Captioning & Metadata

    The model was trained using two complementary caption formats: plain text and structured JSON. Each data subset employed a tailored JSON schema to guide fine-grained control during generation.

    • For cosplay images, the JSON includes:

      • {

        "caption": "...",

        "image_type": "...",

        "image_style": "...",

        "lighting_environment": "...",

        "tags_list": [...],

        "brightness": number,

        "brightness_name": "...",

        "hpsv3_score": score,

        "aesthetics": "...",

        "cosplayer": "anonymous_id"

        }

    Note: Cosplayer names are anonymized (using placeholder IDs) solely to help the model associate multiple images of the same subject during training—no real identities are preserved.

    • For high-quality photographs, the JSON structure emphasizes scene composition:

      • {

        "subject": "...",

        "foreground": "...",

        "midground": "...",

        "background": "...",

        "composition": "...",

        "visual_guidance": "...",

        "color_tone": "...",

        "lighting_mood": "...",

        "caption": "..."

        }

    In addition to structured JSON, all images were also trained with plain-text captions and with randomized caption dropout (i.e., some training steps used no caption or partial metadata). This dual approach enhances both controllability and generalization.

    Inference Guidance

    • For maximum aesthetic precision and stylistic control, use the full JSON format during inference.

    • For broader generalization or simpler prompting, plain-text captions are recommended.

    Technical Details

    All training was performed using lrzjason/T2ITrainer, a customized extension of the Hugging Face Diffusers DreamBooth training script. The framework supports advanced text-to-image architectures, including Qwen and Qwen-Edit (2509).

    Previous Work

    This project builds upon several prior tools developed to enhance controllability and efficiency in diffusion-based image generation and editing:

    • ComfyUI-QwenEditUtils: A collection of utility nodes for Qwen-based image editing in ComfyUI, enabling multi-reference image conditioning, flexible resizing, and precise prompt encoding for advanced editing workflows.
      🔗 https://github.com/lrzjason/Comfyui-QwenEditUtils

    • ComfyUI-LoraUtils: A suite of nodes for advanced LoRA manipulation in ComfyUI, supporting fine-grained control over LoRA loading, layer-wise modification (via regex and index ranges), and selective application to diffusion or CLIP models.
      🔗 https://github.com/lrzjason/Comfyui-LoraUtils

    • T2ITrainer: A lightweight, Diffusers-based training framework designed for efficient LoRA (and LoKr) training across multiple architectures—including Qwen Image, Qwen Edit, Flux, SD3.5, and Kolors—with support for single-image, paired, and multi-reference training paradigms.
      🔗 https://github.com/lrzjason/T2ITrainer

    These tools collectively establish a robust ecosystem for training, editing, and deploying personalized diffusion models with high precision and flexibility.

    Contact

    Feel free to reach out via any of the following channels:

    Description

    init version

    FAQ

    Comments (18)

    1639992813Oct 22, 2025
    CivitAI

    谢谢小志大佬,终于等到了。官方工作流就行吗?

    xiaozhijason
    Author
    Oct 22, 2025

    工作流在做,主要是提示词部分,如果想效果好点,要用json格式的提示词,我后面会放一个工作流模板

    97BuckeyeOct 22, 2025
    CivitAI

    It looks like your fine-tune is entirely Asian based. Is that a fair assumption?

    xiaozhijason
    Author
    Oct 22, 2025· 2 reactions

    yes

    ParamindOct 22, 2025· 4 reactions

    @xiaozhijason Are you going to increase diversity in upcoming versions? Would be very cool.

    xiaozhijason
    Author
    Oct 22, 2025· 4 reactions

    @Paramind It is very hard to expand dataset with quality and diversity. For example, I could verify what is asian's aethestic looking but I am not able to verify what is mexican's aethestic looking and I don't have any channels to obtains those mexican's aethestic images. Of course, VLM could be used to filter the non aethestic images but it also requires resources and man power to verfify. In general, aethestic is biased and mainly subjective.

    ArtfulGenie69Oct 22, 2025· 3 reactions

    Probably a big ask but if you wanted could post the dataset on here or huggingface. Maybe others could propose additional images. A GitHub for image/video datasets would be amazing but doesn't exist yet, I guess huggingface is kinda like that.

    Askeladd_Oct 22, 2025
    CivitAI

    Can you make a SVDQuant of this ?
    So it works with Nunchaku

    xueqing12211Oct 23, 2025

    yes, svdq nunchaku plz

    xiaozhijason
    Author
    Oct 23, 2025

    It is very hard to complie SVDQuan personally. Might extract a combined lora and use base model which provided by nunchaku + the extracted lora would be the path

    anyMODEOct 23, 2025

    @xiaozhijason https://civitai.com/models/2066371 I've extracted it here, if you want to host it instead feel free to download it and post it on here, and I'll remove mine.

    xiaozhijason
    Author
    Oct 23, 2025

    @WANdalf Thanks for helping. I would host the loras here and credit you at description.

    anyMODEOct 23, 2025

    @xiaozhijason No problem, I see you've hosted them now so I'll remove my page.

    ParamindOct 23, 2025

    @WANdalf Is there a noticeable quality difference between r64 and r32? r64 should be higher precision right? I have never used these types of loras before.

    anyMODEOct 23, 2025

    @Paramind R64 is highest, R32 mid, R16 low, in practice R16 is almost indestinguishable from R64

    ParamindOct 23, 2025

    @WANdalf Thank you I will first try r16 then.

    kennedysworksOct 22, 2025· 2 reactions
    CivitAI

    This is amazing. I sent you a cup of coffee as a thank you. May I know a tutorial on how to fine tune checkpoints like you?

    xiaozhijason
    Author
    Oct 23, 2025

    Thanks. It would help me for the further development. Thank you.

    Checkpoint
    Qwen

    Details

    Downloads
    4,168
    Platform
    CivitAI
    Platform Status
    Available
    Created
    10/22/2025
    Updated
    4/28/2026
    Deleted
    -

    Files

    QwenRebalanceV10_v10.safetensors

    Mirrors