Uncensored Mistral 3 3B (Ernie) FP8-FP32

Uncensored Mistral 3 3B (Ernie) FP8-FP32 - v1.0

NSFW

Uncensored Mistral (Ernie)

Custom Mistral 3 3B model for use in Comfy UI with Ernie
A FP8 variant is available that is far faster for cards that support it. (Ampere will autocast to BF16) cards older then Ampere may autocast to FP16 or offload to CPU and cast to FP32

If you see CPU offloading use FP32. Or see below. The reason being is comfy UI will first cast to FP16, then the CPU calculates in FP32, then back to FP16

If you know for a fact your GPU can fit both models and do the calculations you can use one or both of these commands to fix COMFY UI's BS

--bf16-text-enc

--highvram

NOTE: Comfy uses a custom version. Diffusers default can be found here.

Description

FAQ

Comments (20)

shoes22May 12, 2026

CivitAI

Could you upload separate files on HuggingFace? It seems I will not be able to inference it with vLLM or llama.cpp.
(EngineCore pid=26618) ValueError: There is no module or parameter named 'tekken_model' in Mistral3ForConditionalGeneration. The available parameters belonging to (Mistral3ForConditionalGeneration) are.

Felldude

Author

May 12, 2026

Comfy uses a version with a custom naming and appended tekken rather then separate file. When I get the FP32 version complete I will upload it to HF with diffusers format

shoes22May 12, 2026· 1 reaction

@Felldude I see you are gonna upload it, thanks

210881175May 12, 2026· 1 reaction

CivitAI

Not working

Felldude

Author

May 12, 2026

It will only work with comfy, diffusers version will be uploaded later

PartisanoMay 12, 2026· 1 reaction

CivitAI

Works with ernie image (50 steps) and ernie image with turbo lora combo (10 steps). Outstanding job, cheers!

Felldude

Author

May 12, 2026

thanks

jimzlfMay 12, 2026· 1 reaction

CivitAI

hope to have a fp8 version😂

Felldude

Author

May 12, 2026

I plan on releasing the diffusers format in fp32 trained, that would be ideal for someone to quantize

jimzlfMay 12, 2026

@Felldude sorry but how can I quantize? I don't have such experience.

Felldude

Author

May 12, 2026

@jimzlf To use the built in TE engine from Nvidia a Blackwell or newer series card is required (I use Ampere A series cards), and Linux - GGUF has some tools that work with windows but I do not keep up with GGUF

Felldude

Author

May 12, 2026

It is possible to scale without using TE but you will be calculating using BF16

jimzlfMay 12, 2026

@Felldude I just wonder how can I quantize a fp8 version TE for comfyui myself😂

Felldude

Author

May 12, 2026

@jimzlf https://github.com/NVIDIA/TransformerEngine

jimzlfMay 12, 2026

@Felldude oh god I thought I was talking about text_encoder……

Felldude

Author

May 12, 2026· 1 reaction

Just remember if it gets CPU offloaded FP32 is the best format to be in

JouzaDouza7135May 21, 2026· 2 reactions

@jimzlf https://github.com/silveroxides/convert_to_quant

Felldude

Author

May 25, 2026· 2 reactions

I looked into it and while scaling is a much better approach 95% of the time E5M2 is the chosen format - Instead of expensive scaling I chose to make the model hybrid and leave the sensitive parts in FP32

ALFARANKOJun 1, 2026

CivitAI

is it just a uncensored ministral version or is it tuned specifically for ernie?

Felldude

Author

Jun 1, 2026· 1 reaction

It is tuned for all three reasons, one as a LLM question/answer story generation, two as caption software, and three with token prediction for use in DIT models like ERNIE

Checkpoint

Ernie

by Felldude

Download (Beta) View on CivitAI