Uncensored Mistral (Ernie)
Custom Mistral 3 3B model for use in Comfy UI with Ernie
A FP8 variant is available that is far faster for cards that support it. (Ampere will autocast to BF16) cards older then Ampere may autocast to FP16 or offload to CPU and cast to FP32
If you see CPU offloading use FP32. Or see below. The reason being is comfy UI will first cast to FP16, then the CPU calculates in FP32, then back to FP16
If you know for a fact your GPU can fit both models and do the calculations you can use one or both of these commands to fix COMFY UI's BS
--bf16-text-enc
--highvram
NOTE: Comfy uses a custom version. Diffusers default can be found here.
Description
FAQ
Comments (17)
Could you upload separate files on HuggingFace? It seems I will not be able to inference it with vLLM or llama.cpp.
(EngineCore pid=26618) ValueError: There is no module or parameter named 'tekken_model' in Mistral3ForConditionalGeneration. The available parameters belonging to (Mistral3ForConditionalGeneration) are.
Not working
It will only work with comfy, diffusers version will be uploaded later
Works with ernie image (50 steps) and ernie image with turbo lora combo (10 steps). Outstanding job, cheers!
thanks
hope to have a fp8 version😂
I plan on releasing the diffusers format in fp32 trained, that would be ideal for someone to quantize
@Felldude sorry but how can I quantize? I don't have such experience.
@jimzlf To use the built in TE engine from Nvidia a Blackwell or newer series card is required (I use Ampere A series cards), and Linux - GGUF has some tools that work with windows but I do not keep up with GGUF
It is possible to scale without using TE but you will be calculating using BF16
@Felldude I just wonder how can I quantize a fp8 version TE for comfyui myself😂
@Felldude oh god I thought I was talking about text_encoder……
Just remember if it gets CPU offloaded FP32 is the best format to be in



