NewBie image Exp0.1

🧱 Exp0.1 Base
NewBie image Exp0.1 is a 3.5B parameter DiT model developed through research on the Lumina architecture.
Building on these insights, it adopts Next-DiT as the foundation to design a new NewBie architecture tailored for text-to-image generation.
The NewBie image Exp0.1 model is trained within this newly constructed system, representing the first experimental release of the NewBie text-to-image generation framework.
Text Encoders
We use Gemma3-4B-it as the primary text encoder, conditioning on its penultimate-layer token hidden states. We also extract pooled text features from Jina CLIP v2, project them, and fuse them into the time/AdaLN conditioning pathway. Together, Gemma3-4B-it and Jina CLIP v2 provide strong prompt understanding and improved instruction adherence.
VAE
Use the FLUX.1-dev 16channel VAE to encode images into latents, delivering richer, smoother color rendering and finer texture detail helping safeguard the stunning visual quality of NewBie image Exp0.1.
Prompt
XML structured prompt
Natural language prompt
Tag prompt
🖼️ Task type
NewBie image Exp0.1 is pretrain on a large corpus of high-quality anime data, enabling the model to generate remarkably detailed and visually striking anime style images.
We reformatted the dataset text into an XML structured format for our experiments. Empirically, this improved attention binding and attribute/element disentanglement, and also led to faster convergence.
Besides that, It also supports natural language and tags inputs.
🧰 Model Zoo
NewBie image Exp0.1: Hugging face | modelscope
Gemma3-4B-it: Hugging face | modelscope
Jina CLIP v2: Hugging face | modelscope
FLUX.1-dev VAE: Hugging face | modelscope
💪 Training procedure

🔬 Participate
Core
Members
✨ Acknowledgments
Thanks to the Alpha-VLLM Org for open sourcing the advanced Lumina family. which has been invaluable for our research.
Thanks to Google for open sourcing the powerful Gemma3 LLM family
Thanks to the Jina AI Org for open sourcing the Jina family, enabling further research.
Thanks to Black Forest Labs for open sourcing the FLUX VAE family. powerful 16channel VAE is one of the key components behind improved image quality.
Thanks to Neta.art for fine-tuning and open sourcing the Lumina-image-2.0 base model. Neta-Lumina gives us the opportunity to study the performance of Next-DiT on Anime Types.
Thanks to DeepGHS/narugo1992/SumomoLee for providing high-quality Anime Datasets.
Thanks to Nyanko for the early help and support.
Thanks to woctordho for helping improve NewBie’s compatibility with community tools.
📖 Contribute
Neko, 衡鲍, XiaoLxl, xChenNing, Hapless, Lius
WindySea, 秋麒麟热茶, 古柯, Rnglg2, Ly, GHOSTLXH
Sarara, Seina, KKT机器人, NoirAlmondL, 天满, 暂时
Wenaka喵, ZhiHu, BounDless, DetaDT, 紫影のソナーニル
花火流光, R3DeK, 圣人A, 王王玉, 乾坤君Sennke, 砚青
Heathcliff01, 无音, MonitaChan, WhyPing, TangRenLan
HomemDesgraca, EPIC, ARKBIRD, Talan, 448, Hugs288
🧭 Community Guide
Getting Started Guide
LoRa Trainer
💬 Communication
📜 License
Model Weights: Newbie Non-Commercial Community License (Newbie-NC-1.0).
Applies to: model weights/parameters/configs and derivatives (fine-tunes, LoRA, merges, quantized variants, etc.)
For Non Commercial use only, and must be shared under the same license.
Code: Apache License 2.0.
Applies to: training/inference scripts and related source code in this project.
See Apache-2.0
⚠️ Disclaimer
This model may produce unexpected or harmful outputs. Users are solely responsible for any risks and potential consequences arising from its use.
Description
Christmas is just around the corner 🎄 Today we (NewBieAi-Lab) are excited to release NewBie image Exp0.1!
Also, I’d love to recommend a song I really enjoy: “52 Hearts.” It fits the season perfectly.
Wishing you all a wonderful Christmas and a relaxing holiday—have fun and enjoy! ✨
FAQ
Comments (47)
I'm getting the next error in linux: TypeError: 'function' object is not iterable. I have followed all the instructions, but it seems that there's a problem loading the jina clip. There is a workflow example of this model working on linux?
After few hours, the problem was solved installing flash_attention compiled from source
Extremely long and tedious installation, ends up giving several errors, especially on Linux. It looks promising, and I have a PC to use it on, but after four hours of banging my head against the wall trying to get it to work, it's not worth the effort. I would like you to continue the good work. Please make the model easier to download, especially on Linux.
wdym, just install their custom nodes
@qek It's not just about downloading the custom nodes, in fact, they don't even have them. First, you have to create a venv using miniconda and install flashattention, which the tutorial doesn't provide an equivalent version for Linux, which is bad enough. However, I managed to figure that out. The real problem is that after several installation errors with both the venv and the modified comfy, I finally managed to get comfy to run, which I thought would finally allow me to use the model. However, unfortunately, it doesn't work as soon as it tries to read gemma3 and jina clip (and I know how these models work because I've used rouwei gemma3 and it worked). Basically, I tried everything: changing the folder locations, changing their names, doing everything that's suggested, and it simply says the library is corrupted and that's it.
@hexmachina nah https://github.com/NewBieAI-Lab/ComfyUI-Newbie-Nodes
@hexmachina Follow the flash attention repo instrunctions, (all the instructions) it requires few packages like ninja, packaging, etc. You also can download the models leaving just the mode name "google/gemma-3-idk" and just having your huggingface cli logged with your account. Additionally, I had to use their custom Comfyui, I wasn't able to use my already installed version.
@Konoko I don't have flash attention, but ran Newbie anyway
@qek Did you just install the custom nodes and extra models and it worked?
@hexmachina Yes, but I hate the fact it needs the text encoders in the Diffusers format
@qek I'm running it on Linux, maybe it could be a bit different if you are using windows. Anyway in my case, was necessary
@Konoko I read that comment, I know
Quick question, can I use SDXL Lora with this model?
It's a new architecture, so it can't.
Lol, no way
No
aiaight
@kuraig ?
@qek I mean alright
希望开发团队能优先解决掉部署门槛,为了尝试这个模型重新部署一个ComfyUI,实在会让大多数人却步。
It's good, however im getting an oom on every other message even with 16gb of vram and 80gb of ram.
Does any1 have tutorial how to run this stuff? that not include separate comfy fork instance or just straight up using it without interface
This is a amazing model that's really getting bogged down by the installation process. Hope they can figure some easy way soon.
Anyway how I got it to work is to follow their tutorial selectively, First skip all the beginning stuff and go straight to "Install necessary components" to get the proper flash attention if you don't which one to pick, once you start comfy you should see a "pytorch version: 2.6.0+cu124"--this is mine yours maybe different and then pick the flash that matches those numbers and doesn't have linux at the end (unless you are on linux lol).
Also If you are on portable comfy make sure to have this before the pip install
python_embeded\python.exe -m pip install "Flash_ATTN_PATH"---This is where you saved that file from before
Then skip to the "Install the ComfyUI-Newbie-Nodes" and continue from there, should be fairly straight forward from then on (for KJNodes follow the installation on his github page and not the tutorial)
Can i use it in Comfyui with an gtx970 or 4gb of vram?
The best local checkpoint for abstract and surreal
我用的4080跑示例图片里的工作流,就算调低了分辨率也需要差不多10分钟,这个速度是正常的吗,相同分辨率的图片光辉系模型只要十几秒。。
不正常,看看cpu offload开了没,之后看看是不是用了共享显存
@Creeper_MZ cpu offload是关着的,运行时我看了一下共享显存占用只有0.3g,应该也不是共享显存的问题
vae可能要换分块
主要还是看卡哪一步了针对性处理。cpu_offload建议开的。
Native ComfyUI support has been added, thanks for the PRs
I don't get it. Comfy refused to support some major new checkpoints, and he support this model. I'm not talking bad about this model, I haven't tried it, and never heard about it. I simply can't understand comfy's decision sometimes lol
@kevenggg868 I get it, comfyanonymous only adds models which are easy to implement and really worth it. Also, there are other devs, you can check PRs to see a lot of progress to add more models, it isn't just writing a code and done, it should be polished and optimized. For example, there is no reason to implement yet another image generator with 80B params and a bad license












