Combining Qwen Q4 with Wan 2.2 for High Quality Text-to-Image Generation (Complete Setup Guide)

I wanted to share my experience combining two models that work really well together. After getting tired of vague answers about workflows, I decided to document everything properly.

My Setup:

  • Graphics Card: RTX 3090 with 24GB VRAM
  • Primary Model: Qwen Q4 GGUF format
  • Upscaler: Wan 2.2 Low GGUF
  • Processing Time: About 5 minutes first run, then 80-130 seconds for 0.5-1MP images

The Process Has Two Parts:

First part takes 42-77 seconds using Qwen sampling at different resolutions (0.75MP, 1.0MP, 1.5MP). This stage can handle very low resolutions and I haven’t found the limit yet.

Second part takes around 110 seconds using Wan 2.2 with 4 steps for upscaling.

What I’ve Tested So Far:

  • Text rendering gets blurry at 1.5x but comes back sharp at 2.0x upscale.
  • Portrait quality is mixed - older male faces work better than closeups.
  • Full body and medium shots turn out okay.
  • Using 0.75MP actually smooths out facial features nicely.
  • The model seems to add freckles everywhere for some reason.

Still Need to Test:

  • Landscape images
  • City scenes
  • Interior shots
  • Higher resolution output like 4K
  • Using fewer steps in the first stage.

The main discovery is that Qwen latents work perfectly with Wan 2.2 sampling, which opens up some interesting possibilities for workflow optimization.

Thanks for the breakdown! I’ve been testing similar setups but with a twist that might help. Instead of your two-stage process, I preprocess the prompts before sending them to Qwen Q4. This really boosts output quality, especially for those portrait issues you mentioned. The freckle problem is probably from Qwen’s training data weighting - I’ve seen this get worse with certain prompt structures. For the 1.5x text blurriness, try lowering the CFG scale during Wan 2.2 instead of just bumping resolution. Also, if you haven’t tried different sampling methods in the first stage, that might fix your facial feature consistency. Takes barely any extra compute but the quality jump is huge, especially for medium shots.

Your timing matches what I’m seeing on my 4090. If you’re getting consistent freckle artifacts, check your Qwen Q4 quantization - some versions have weird biases baked in from conversion. Also found Wan 2.2 works way better with intermediate resolution targets instead of jumping straight to final output. Don’t go 0.75MP directly to 2x - try 1.2MP first, then final upscale. Adds maybe 30 seconds but kills most of those sharpness inconsistencies. That text clarity issue at 1.5x is definitely a Wan sampling quirk - happens with other models when you hit that range too.

Manual chaining between Qwen and Wan is such a hassle, especially when you’re testing different parameters or processing batches.

I hit the same wall with our content pipeline. Constantly managing handoffs between models drove me crazy, so I moved everything to Latenode.

You can automate the whole Qwen to Wan 2.2 flow. It handles timing, manages VRAM automatically, and queues batches without you watching every step.

Best part? A/B testing different configs in parallel. Want to compare 4 vs 6 steps in Wan 2.2? Test multiple CFG scales for that text blur? Just duplicate the workflow and run variants at the same time.

I also set conditional logic for different image types - portraits get one parameter set, landscapes get another. No more manual tweaking between runs.

Saves me hours weekly vs running model chains by hand. Check it out: https://latenode.com

I’ve been running production workflows with these models for months. The real bottleneck isn’t upscaling - it’s VRAM management between stages.

Game changer for me: unload Qwen completely before starting Wan 2.2. Takes 10 seconds to switch models but stops memory fragmentation that tanks performance.

Those freckle artifacts? It’s training data contamination in the Q4 quant. I use FP16 for critical work - more VRAM but much cleaner output.

Your 0.75MP sweet spot makes perfect sense. Qwen’s attention layers work best at that resolution. Higher just creates noise that Wan has to fix later.

Try this: run first stage at 0.75MP like you’re doing, save latents as EXR files between stages. Lets you test Wan parameters without regenerating everything. Total game changer for batch processing.

Nice workflow! I’m running something similar on a 3080 with 10gb VRAM. Had to lower batch sizes, but qwen/wan combo still rocks. The freckle issue goes away if you add negative prompts about skin texture early on. Try bumping wan 2.2 to 6 steps instead of 4 - barely takes longer but fixes most of those portrait inconsistencies.