When you have 400+ models available, how do you actually decide which model retrieves versus which one generates in RAG?

I had this realization while setting up a RAG workflow: I had access to a ridiculous number of models through Latenode’s subscription, and suddenly the question wasn’t “can I do this?” but “which model should do what?”

For retrieval, I started experimenting. Some models are faster at semantic matching, others handle edge cases better. For generation, I wanted something that could reason about the retrieved context and cite sources accurately. Turns out they’re completely different problems, and throwing the best model at both doesn’t necessarily work.

I ended up using different models for retrieval versus generation based on what actually mattered: retrieval speed and relevance, generation quality and accuracy. The cost difference is real when you’re running thousands of iterations. But more importantly, the accuracy actually changes depending on the model pair.

What I’m realizing is having more choice doesn’t make the decision easier. It makes it more important to actually test. Has anyone else gone through this? What’s your model pairing strategy, and how much performance difference did you actually see?