I’m struggling to find a solid workflow that can take a character image and apply a different pose while keeping the person looking the same. Here’s what I’ve tested so far:
My main issue with OpenPose + img2img:
Denoise under 0.5 = character stays the same but pose hardly changes
Denoise over 0.6 = pose works but character becomes different person
Other tools I’ve tried:
Flux Kontext: sometimes works but very inconsistent, slow processing, uses tons of VRAM
Nunchaku + turbo lora: fast results but fails about 80% of the time
All I want is to feed in a reference photo and a target pose, then get that exact same person in the new pose. Seems like it should be straightforward by now but I keep hitting walls.
Has anyone found a reliable method for this? I’m open to trying different models or workflows. The consistency is more important to me than speed at this point.
This is such a pain. I’ve been using reface models lately - completely different approach than what everyone’s trying. Skip the controlnet headaches and train a quick dreambooth on your character first. Takes 20 minutes but pose transfers work way better since the model actually knows your person. Way less parameter hell.
Character consistency breaks because diffusion models treat identity as one embedding instead of keeping actual facial geometry intact. I’ve had good luck with IPAdapter + face embeddings combined with AnimateDiff for pose sequences. The trick is running your reference image through multiple face encoders at once - don’t just use one. I run it through ArcFace and InsightFace together, then blend their embeddings with different weights based on how complex the pose is. Simple poses need less identity weighting, complex ones need more. SDXL base models preserve features way better than SD 1.5 variants too. You’ll need to manually tune the embedding weights for each character type, but once you dial it in the results are solid. Takes me about 15-20 test runs to nail the sweet spot for new characters, but then it’s locked in. Way more consistent than wrestling with controlnet combos.
Most people treat this like a single transformation, but that’s not what’s happening. After months of testing similar setups, I’ve learned that reference image quality matters way more than people realize. Your source photo needs the right lighting and angles or the model gets lost during pose mapping. I always run my reference images through face restoration models first - even clean photos get better results. The real breakthrough was switching to weighted pose conditioning instead of binary controlnet inputs. Don’t use full strength pose control. I run mine at 0.7-0.8 and let identity preservation handle the rest. This gives the model room to keep facial structure intact while following pose guidance. Denoise settings aren’t one-size-fits-all either - they depend on how much your target pose differs from the reference. Subtle changes work fine at 0.4 denoise, but dramatic angle shifts need 0.65+ to generate properly. Use pose similarity scoring to dial in the right denoise range for your specific case.
Been dealing with this exact headache for months. Those tools try to do everything at once - that’s the problem.
Break it into a pipeline instead. Don’t fight with denoise settings hoping for the best. Automate the whole thing with proper preprocessing.
My workflow extracts facial features from the reference image first, applies pose guidance separately, then does consistency checks before final generation. Multiple validation steps catch failures early instead of wasting compute on bad outputs.
Game changer: automated retries with different parameter combinations. When one setting fails, it tries the next combination automatically. No more manual tweaking.
Building this kind of smart pipeline is pretty easy. I use Latenode to chain the different AI services and handle retry logic. Connects directly to Stable Diffusion APIs, handles image processing, and does quality scoring to pick the best result.
Way more reliable than manual workflows. Runs hands-off once you set it up.
I’ve been dealing with the same thing at work. What finally worked was InstantID + pose transfer. Forget all the denoising tweaks.
Face consistency and pose transfer are totally different problems. InstantID crushes identity preservation way better than messing with denoise settings.
What actually works: InstantID handles the face/identity. Then use DWPose (beats OpenPose) for poses. Feed both as separate control inputs.
More steps? Yeah. But you get consistent results instead of rolling dice on parameters. I hit 90%+ success rates this way.
VRAM’s a pain though. If you’re maxing out memory, run the models one at a time instead of loading everything together. Slower but won’t crash.
I’ve shipped three avatar generation features at work and this exact problem killed two of them.
Your tools aren’t the issue. You’re fighting the model’s training bias. These models learned from millions of random internet photos where pose and identity are all mixed together.
What finally worked: train a lightweight LoRA on your specific character first. Skip dreambooth - use LoRA. Takes 5 minutes with 8-12 good photos of your person.
Then do pose transfer with that custom LoRA loaded. The model already knows your character’s face, so it stops trying to “fix” their features when you change poses.
I keep LoRA weight at 0.6-0.8 and run standard controlnet pose at 0.9 strength. Way higher than what works with generic models, but the LoRA handles identity so controlnet can focus on pose alone.
We hit 94% success rates testing this. The 6% failures were mostly bad pose detection, not identity drift.
You’ll need to train a LoRA for each new character. But 5 minutes of training beats hours of parameter tweaking. Once it’s trained, that character just works.
We automated the LoRA training since users kept requesting new characters.
the problem is these tools are way too general. skip the img2img headaches and use controlnet with multiple conditioning instead - combine openpose with depth maps. once you get the pose dialed in, inpaint the face separately. it’s more work but you’ll get way more consistent results.
You’re trying to do too much at once. These models can’t handle identity and pose at the same time - they just break.
I split mine into stages instead. First pass extracts and validates pose. Second handles identity transfer. Third scores quality and decides if we need to run it again.
The real trick is the orchestration layer. Low pose confidence? It automatically tries different detection models. Identity preservation sucks? It tweaks blending weights and reruns.
No more babysitting sliders or crossing your fingers. The system figures out which settings work for different inputs.
I use Latenode to coordinate everything. It connects the AI APIs, handles retries, and tracks what actually works. Does all the preprocessing and validation scoring too.
Your 80% failure rate jumps to 90% success when you automate this stuff instead of guessing at parameters.