HunyuanVideo BDSM Bondage - CivArchive (CivitAI Archive)

HunyuanVideo BDSM Bondage - ep110

NSFW

If you like my work, please watch me on DeviantArt: https://www.deviantart.com/cloud-artisan, I post more there.

UPDATE: a new Wan2.1 model outperforms Hunyuan significantly, I suggest using it + my new LoRA - https://civarchive.com/models/1305489/wan21-b14-bdsm-bondage

Tips

Sampler: I got the best results with "gradient_estimation", "DPM++ 2M" and "ipndm" (exactly in this order from the best to worse), all with a "beta" scheduler. The default "euler" gives much less precise results.
Guidance: I suggest high values (around 10) because bondage is a complex motion. Making is lower gives messier results.

Example prompt

A young woman is kneeling on the floor of a bedroom, bound and gagged. Her hands are tied behind her back. Tight little white sports shorts and top accentuate her figure. She is gagged with a thick cloth gag. She twists and turns intensely to free herself, but fails and remains bound. Handheld shot. Tense atmosphere. Realistic style.

Description

FAQ

Comments (21)

VrahelFeb 6, 2025· 1 reaction

CivitAI

I got to ask. Why is there this increase in hunyuan loras. Does every one of you have a high end grafic card and uses comfiUI? Is there a trick or method of using hunyuan on mid-Vram pc localy?

cloud_artisan

Author

Feb 6, 2025· 1 reaction

I use Hunyuan just because it gives the best results even though the generation time is long and the VRAM requirements are high. I run it in the cloud (runpod), so no local high-end card.

VrahelFeb 6, 2025

@cloud_artisan thx for the response. Gotta look into it what is the time and money invest to setup a runpod.

WhatTheGuyFeb 6, 2025· 4 reactions

this tool with a GUI and everything already set up on windows got released 3 weeks ago. https://civitai.com/articles/10335 So I made my Loras with it 1-2 weeks later. But locally on a 4090

blyssFeb 6, 2025· 4 reactions

@Vrahel I mean as far as inferencing, you can run locally with as low as 12GB for sure using ComfyUI + Hunyuan Wrapper, fp8, with block swap and torch.compile and get 544p@129f or maybe more. Lower res or shorter could work on even lower VRAM. With the same I can make 1280x720x129f on 16GB though I usually stick to 544p for reasons of generation length. Also there are GGUF quants going down to 4 bits and even lower though personally I think motion suffers a LOT below 8bits. As far as the increase of LORA, I wondered that myself but I think the other answer provided is likely why!

Edit for the curious:
I have a 4070 Ti SUPER 16GB, the following are typical generation times with 50 steps, no Teacache and optimal memory config:
544p@129f: ~24min (e.g. 960x544, 544x960, 720x720)
720p@129f: ~55min (e.g. 1280x720, 720x1280, 960x960)

Times can be reduced with Tea/First block cache, FastVideo LORA, etc though my best results are always without any of that. I do run an RX 5700 XT to drive my display so the entire 16GB of the 4070 TS is available for use. If you're running on the same card that runs your display/desktop, things will be tighter.

mofo69Feb 6, 2025· 4 reactions

I'm surprised u have to ask, Hunyan is, at last, a text2video model that actually works , in fact it it excels , uncensored high quality videos of anything you want, amazing. and the hardware needed isnt that high end, 8vram gpu and up, I run on a gaming laptop rtx3080 16gb and it works just fine. As for ComfyUi, it aint that difficult, if I can do it anyone can, there are 100s of good tutorials online , just takes a bit of focus, lol

makiaeveliFeb 7, 2025· 2 reactions

Currently sitting here waiting 20 mins for a video and I don't even care

jonk999Feb 7, 2025· 1 reaction

With one of the workflows here I can get a video done in around 5 minutes on my RTX3060 12GB. Admittedly pretty low quality, but if it's a good generation I'll then re-run it at a higher quality and take a time hit.

az420Feb 7, 2025· 2 reactions

HV has essentially surpassed all other models... SDXL and even Flux were so awesome at first, but when I go back to them now they look horrible. Censorship really ruined them imo...

kmdcompFeb 7, 2025· 2 reactions

@az420 agreed. I haven't used another model since I tried Hunyuan. It's insane how good it is at NSFW stuff by itself and it trains easily if you want to teach it new concepts. Even on still images, if you describe some kind of motion in the captions, it can usually figure out how to do it.

KnoxxxonkFeb 7, 2025

i can run it on 6gb vram it just runs slow. i use the "native" workflow. from what i can see, the tile size on the vae decode is a major contributor to what you can run. 256 or higher and it becomes unstable for me.

VrahelFeb 7, 2025

@az420 do u mean even for normal image creation? Supased in realistic visuals or how did it surpas even flux? Man it seems i have to sit my ass down and learn how to instal HV und how to use comfy...

az420Feb 7, 2025· 1 reaction

@Vrahel FLUX (and SDXL) will straight up choke on anything NSFW, like, produce nightmare fuel. IMO the distilled nature of FLUX has made it very hard to teach it new concepts. I could be slightly out of date, but i straight up gave up on FLUX for NSFW stuff, it just can't do anything beyond showing the naked bits, and even then you have to get lucky.
With some of the NSFW LORAs for FLUX I feel like you have to roll the dice and might get 1/30 renders looking interesting, but at that rate it seems like a fluke and FLUX is almost as slow as HV.

As for SDXL, I've tried the state of the art NSFW models and... they just aren't very capable... unless I'm missing some gem model that is just unknown to me.

rustybutlerFeb 8, 2025

@blyss which workflow like this works in 12gb and lets you output 544p @129 frames? Using kijai nodes always chokes my system out when (down)loading the text encoders. I've been using native, but there's no way it lets me do that. 584x380 ish at 69 (nice) frames is the sweet spot for me. Oom otherwise. Am I missing something?

blyssFeb 8, 2025· 3 reactions

@rustybutler I use my own custom workflow that uses kijai's Hunyuan Wrapper nodes. If you OOM on the text encoder(which is easy to do it's an 8B param LLM), try setting the quantization on the "(down)load" node to fp8 or even bnbnf4 if you must. Make sure that the "force_offload" attribute of any nodes in your workflow is set to "true", make sure to use the torch.compile node(backend: inductor, fullgraph: false, mode:max-autotune-no-cudagraphs, dynamic: false, cache_size: 32(or less), enable for at least the single and double blocks).

For memory management you have an easy option and a fine grained option:
Easy: Enable "auto_cpu_offload" on the model loader node. This will net max VRAM savings but might not be optimal speed.
Fine grained: Add and connect the BlockSwap node and choose how many blocks to swap manually. This let's you balance the speed with the VRAM but it takes some finagling to find sweet spots. Use either or, not both!

For decoding make sure to enable_vae_tiling, in my experience auto tile size doesn't work well. Leave temporal size at 64 or it will stutter, but lower spatial as needed - mine is at 224 for 16GB. You might wanna use a 2 step process where you generate, save latent as the first step and then load and decode as the second until you get the hang of things so you don't loose a good gen to OOM.

Lastly the big thing that I neglected to mention previously (sowwy!): USE SAGE ATTENTION(attention_mode: sageattn_varlen on the model loader) - this is the big win for both VRAM and speed. On linux can pip install it to your venv after activating with "pip install sageattention" but if you want the better 2.0 version you'll need to follow the instructions to install from source. If you are on linux you can activate your ComfyUI venv then:

git clone https://github.com/thu-ml/SageAttention.git
cd sageattention
pip install -e .

If you are on Windows it's a little more complex, but I think some prebuilt wheels exist here: https://github.com/sdbds/SageAttention-for-windows

Sage is an optimized, quantized attention kernel that uses a mixture of precisions to accelerate the attention calculation and save activation memory. You will see the best results on 4xxx+ cards with fp8 accel but 3xxx(and possibly older) can benefit from VRAM savings too. Lastly, you can set "upcast_rope" on the model loader to False for a wee bit more savings with a small quality hit but I usually leave it True.

That's pretty much it! With these settings and some tweaking I can even run the base model unquantized. I'm testing my latest iteration of my kissing model against the full fat bf16 model now. Manual blockswap with 18 doubles 36 singles and just the rest of things I mentioned. Currently making a 720x720 with an eta of 26mins and VRAM usage of 13.8GB (which means I could have swapped less blocks and got it done slightly quicker!)

Edit: Oh you can use Sage with Comfy native too, and other models like Flux etc though the benefits are most massive for video. But just --use-sage-attention when starting up(after installing it, ofc)!

Edit2: For the quantization of the DiT itself on the model loader node, none of the torchao options work for me so I can't speak to them but otherwise you should use fp8_e4m3fn_fast if you have an Ada (4xxx) or newer card, fp8_scaled is for use specifically with the Tencent released "mp_rank_00_model_states_fp8.pt" file, and fp8_e4m3fn otherwise. Or you can run with no quantization but ofc that's more VRAM.

mofo69Feb 9, 2025

@blyss thanks for that very useful information, altough when I add --use-sage-attention to my comfy run_nvidia_gpu.bat it causes comfy to crash on load.

SnooploopFeb 10, 2025· 1 reaction

@az420 I agree HV is a step up and my loras trained in diffusion pipe are essentially spot on, better than any lora ive trained for flux, pony and other xl models. BUT I will say I get pretty damn impressive results with ponyrealism, my custom "character" loras trained on ponyrealism combined with fintuner loras for ponyrealsim and adetailer for the faces. I'll set up a batch of 20 images at a time and usually get 3-5 that are pretty damn good with sometimes almost perfect likeness of my custom loras. Then I'll usually run those 3-5 through img2img with adetailer again and then upscale. Sometimes it can take some work to get prompting right, lora weights, etc. But I gotta say, sometimes it's wild how good it is. But again HV with custom loras trained in diffusion pipe is simply amazing. I expect my electric bill to almost double this month with how much I've been using it

az420Feb 10, 2025

@Snooploop Which model are you getting great results from exactly? As much as I love Pony, I've never had a good time trying to get it to do realism...

SnooploopFeb 11, 2025· 1 reaction

@az420 Pony Realism v2.2 and 2.1, Here is a good example a user posted using it. When combined with finetuner loras that are trained on pony/ pony realism and your adjustments dialed in after experimenting you can get some great results. https://civitai.com/images/56996753

VrahelFeb 11, 2025· 1 reaction

@az420 I use a realistic illustrious Checkpoint (like https://civitai.com/models/1032120/thrillustrious?modelVersionId=1306218) combined with detail-lora and some more enhancing stuff for more "real person" feeling like "amateur" or "flux-like" loras. Adetailer is always helping with detailes in face and eyes as well. The results are so good.

I find that realistic illu-Checkpoints are very flexible but astonishingly real.

P.S.: when using Thrillustrious i would recommend the sampler combo "Euler a + Beta". Beta give me the best result, but it doesnt work with adetailer well, so u gotta change the sampler in adetailer.

EndlessDreamOnceHumanMay 6, 2025

Try the app downloaded/intalled via pinokio.com . Works with low VRAM

LORA

Hunyuan Video

by cloud_artisan

Download (Beta) View on CivitAI