Proteus v0.6
I'm excited to introduce Proteus v0.6, a complete rebuild of my AI image generation model. This is the first version of the rework, focusing entirely on enhancing photorealism. While it's not aiming to be state-of-the-art, I believe it's a good step forward in producing high-quality images. Please note that this is a preliminary version, and it's not the final, fully-featured checkpoint—more improvements and features will come in future updates.
Overview
Proteus v0.6 is a total rework from the ground up. In previous versions, combining different training methods and learning rates caused the model to become unstable during large-scale training. Learning from those experiences, I've retrained the model using only the photorealism aspects of the Proteus dataset.
For now, I'm calling this new training technique Multi-Perspective Fusion.
Multi-Perspective Fusion
This approach involves:
Training Multiple LoRAs and Full-Parameter Checkpoints: I trained several Low-Rank Adaptation (LoRA) modules and full-parameter checkpoints on the same dataset multiple times to capture different "perspectives" of the data.
Integrating into an Overarching Framework: These varied models are then combined within a larger framework to enhance overall performance.
I'm hoping this method will be interesting to data scientists exploring advanced training techniques.
Key Improvements in v0.6
Total Rebuild: Constructed entirely from scratch to address previous issues.
Enhanced Photorealism: Focused on producing good-quality photorealistic images.
Stable Training Process: Refined training methods to prevent the model from falling apart during large-scale training.
Preliminary Version: This is the first version of the rework; expect more features and improvements in future releases.
Limitations
No Illustrations or Anime: Currently, the model can't generate illustrations or anime-style images because it's only been trained on photorealistic data.
Not State-of-the-Art: While the model performs well, I'm not claiming it's state-of-the-art—just that it's a good starting point.
Work in Progress: This is not the final, fully-featured checkpoint. More updates are planned.
Usage
Recommended Settings
Clip Skip: 1
CFG Scale: 7
Steps: 25 - 50
Sampler: DPM++ 2M SDE
Scheduler: Karras
Resolution: 1024x1024
Versions before v0.6
Proteus's Background
Proteus serves as a sophisticated enhancement over OpenDalleV1.1, leveraging its core functionalities to deliver superior outcomes. Key areas of advancement include heightened responsiveness to prompts and augmented creative capacities. To achieve this, it was fine-tuned using approximately 220,000 GPTV captioned images from copyright-free stock images (with some anime included), which were then normalized. Additionally, DPO (Direct Preference Optimization) was employed through a collection of 10,000 carefully selected high-quality, AI-generated image pairs. In pursuit of optimal performance, numerous LORA (Low-Rank Adaptation) models are trained independently before being selectively incorporated into the principal model via dynamic application methods. These techniques involve targeting particular segments within the model while avoiding interference with other areas during the learning phase. Consequently, Proteus exhibits marked improvements in portraying intricate facial characteristics and lifelike skin textures, all while sustaining commendable proficiency across various aesthetic domains, notably surrealism, anime, and cartoon-style visualizations.
Description
This update enhances stylistic capabilities, similar to Midjourney's approach, rather than advancing prompt comprehension. Methods used do not infringe on any copyrighted material.
FAQ
Comments (17)
Hmm, so it seems this 0.4 breaks some lora like the popular cute_3d_render lora I've been using a lot of with Proteus. Also, it's not listening to style choices now. I say in the style of pixar and I get a brush stroke impressionist style on a lot of stuff.
V0.4 is too artistic. A serious photo prompt is now an abstract art or something... I stay on v0.3.
I have to agree 0.4 is a mess, 0.3 works better (even if I only tried the lightning version of both).
0.4 is that bad I though I was using the wrong sampler.
I don't know what I'm doing wrong but I get warped and ugly results with weird colors even with following the settings and I have never failed to find some settings / VAEs with any other model before. Anyone else? Any solutions?
My advice is to keep the CFG at around 4, or no higher that 7 if you can access 'DynamicThresholding (CFG-Fix) Integrated' in Web UI Forge. I've got some amazing results from Proteus0.4.
The other key settings I have are: DPM++ 3M SDE Exponential, 60 Steps and the following negative prompt (that I use for 90% of my images):
badhandv4, By bad artist -neg, easynegative, FastNegativeV2, ng_deepnegative_v1_75t, jpeg artifacts, decompression (watermark:1.2)((3d, render, cg, painting, drawing, cartoon, anime, comic:0.6)), lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry, artist name (long body), bad anatomy , liquid body, malformed, mutated, bad proportions, uncoordinated body, unnatural body, disfigured, ugly, gross proportions ,mutation, disfigured, deformed, (mutation), child, b&w,malformed eyes, ((poorly drawn face)), strabismus, cross-eye, heterochromia, (deformed iris), (deformed pupils),child, paintings, sketches, (worst quality:2), (low quality:2), (normal quality:2), lowres, normal quality, ((monochrome)), ((grayscale)), skin spots, acnes, skin blemishes, age spot, glans, mutated hands, (poorly drawn hands:1.5), blurry, (bad anatomy:1.21), extra limbs, lowers, bad hands, missing fingers, extra digit ,bad hands, missing fingers,signs, labels,neon signs,logos,company logos; cartoon, d, wrinkles, clothing logos,clothing brands,
I'm amazed more people aren't using this checkpoint. I've not come from using any of the previous versions, so perhaps those that have may need to disregard their previous experience and start fresh. i.e. treat it as a completely different checkpoint and experiment from there.
You did not include an image, making it difficult to identify the possible problem.
Great distinctive model! You are moving in the right direction and I hope you do well. I was pleasantly surprised by Proteus, it's a great work that stands out among many others. Best of luck in completing the final version! And thank you very much for your hard work.
Am I correctly understanding this is the tag list of tags used to train the model? I find it strange, womancrushwednesday has 39000 tags, and 'woman' only like 15000.
https://huggingface.co/dataautogpt3/ProteusV0.3/raw/main/tokenizer/vocab.json
I don't think it's a count. I think they're numeric id's of each token.
vocab.json is the words that get their very own tokenid, along with the id nunber.
all other words will be represented by sticking together word fragments.
See a token/prompt explorer plugin in your favourite program to see how it works.
The same exact tokens are mentioned on other models
@Fatbuns exactly. it’s the base dictionary list of the clip-l text encoder, not the model specifically
@phil866 so in ComfyUI what would that be?
@Janet you are misunderstanding the concept here I think.
What are you trying to accomplish in comfyUI here?
fyi, all the "turn text prompt into tokens, and then into an embeddig" stuff, is accomplished by the CLIP model.
and the "vocab.json" file is just an informational file. It doesnt get used directly by SD.
ALL the words in it have already been absorbed into the CLIP model.
lol troll post
Works as expected. Only images featuring women who were NOT crushed on wednesday should be tagged with the woman tag.
Love the effect of this model I already use opendalle a lot and this just adds that more stylized output, very consistent follows prompts very well.
Details
Files
Available On (1 platform)
Same model published on other platforms. May have additional downloads or version variants.







