Assembling your first rag stack using multiple ai models—where do you actually start?

I’m planning to build a RAG system for our knowledge base, and I have access to 400+ AI models through a subscription. The problem is I’m paralyzed by choice. Which model should I use for retrieval? Which for generation? Do I even use the same model for both?

I understand conceptually that different models might excel at different tasks—some are faster, some are more accurate, some specialize in reasoning. But when I’m starting from scratch, how do I make these decisions without just trying every combination and burning through my day?

I’m also wondering if there’s a ‘sensible starting point’ that most people recommend, or if the right answer really does depend entirely on my specific knowledge base and use case.

Has anyone actually gone through this decision tree? What did you pick and why?

Start with Claude for generation and OpenAI embeddings for retrieval. That’s the reliable default. Both handle context well, produce consistent results, and work across different knowledge base types.

For reranking (which you’ll want if you have a large knowledge base), use a smaller, faster model like Mistral. It’s good at ranking without the overhead of a powerful model.

The beauty of having 400+ models available is you’re not locked in. You test this setup, measure accuracy on your actual data, then swap models if needed. Takes maybe an hour to test and compare.

Don’t overthink it. Start vanilla. Measure. Adjust. That’s how you find what works for your specific needs.

I started with the exact question you have. The paralysis is real. Here’s what I did: I picked Claude for generation because it handles long context windows well. For retrieval embeddings, I went with standard OpenAI embeddings because they’re well-established and work across different document types. Then I built a quick test with 20 sample queries and measured accuracy.

Turns out Claude with OpenAI embeddings got us 87% accuracy right out of the gate. Did I need to test other combinations? Probably not. Good enough saved us days of experimentation.

The real lesson: start with what others recommend, measure it on your actual data, then optimize if needed. Most times, the defaults work fine.

The decision flow I’d recommend is straightforward. First, consider your retrieval needs—how large is your knowledge base? For embeddings, standard models work unless you have specialized domain text. Second, think about generation—do you need speed or depth? Fast responses favor smaller models; nuanced answers favor larger ones. Third, test on a small sample of your actual queries. That empirical data is worth more than theoretical reasoning. I’ve found that starting with popular pairings and then swapping one component at a time helps isolate what actually matters for your data.

Model selection follows a tiered approach. For embeddings, dimensionality and training data relevance matter most. For generation, context window size and instruction-following ability are critical. The optimal pairing depends on knowledge base size and query complexity. I typically recommend starting with Claude 3 for generation (128k context window) and OpenAI’s Ada for embeddings (cost-effective at scale). This pairing scales well and requires minimal tuning. Measure retrieval accuracy on domain-specific questions; if below 80%, consider reranking or embedding fine-tuning.

start: Claude + OpenAI embeddings. test on 20 sample queries. measure accuracy. adjust if needed. dont overthink it.

Claude for generation, OpenAI embeddings for retrieval. proven combo. test it first.

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.