Share your thoughts on using Retrieval Augmented Generation systems

Hey everyone! I’m curious about how people are implementing RAG in their projects. I’ve been looking into it lately and want to hear some real world feedback from folks who have actually used it.

What language models are you pairing with your retrieval systems? Are you going with something like GPT, Claude, or maybe open source alternatives like Llama? I’m trying to figure out which direction to go.

Also wondering about the quality of responses you’re getting. Does the retrieved context actually help make the outputs more accurate, or do you still run into hallucination issues?

What kind of use cases are you applying RAG to? Document search, customer support, internal knowledge bases? Really interested to learn from your experiences and maybe avoid some pitfalls. Thanks!

I’ve been playing around with RAG for 6 months now - it’s solid once you survive the setup nightmare. Claude Sonnet’s my go-to since that massive context window is a lifesaver. Yeah, you’ll still get wonky answers when retrieval grabs random chunks, but way fewer hallucinations than straight LLM. Start small and nail your chunking strategy before diving deep.

I’ve been using RAG systems in production for the past 18 months, and the results have been quite mixed, yet mostly positive. Initially, I started with OpenAI but eventually transitioned to a hybrid approach where Mistral 7B handles the majority of queries while GPT-4 is reserved for more complex reasoning tasks. Utilizing context retrieval has effectively reduced hallucinations compared to solely generative outputs, but it is crucial to ensure solid filtering within your knowledge base. I learned this the hard way; the principle of ‘garbage in, garbage out’ is real. It took considerable effort to clean and structure our documents before we noticed any improvements. We primarily apply this for internal tech documentation searches, which works well for straightforward factual inquiries, although it tends to struggle with policy matters requiring interpretation. Additionally, we’ve implemented it in customer support, leading to a 60% reduction in ticket escalations. One key takeaway is to significantly invest in your embedding model and chunking strategy; it took us three attempts to achieve a reliable system. The quality of retrieval can truly make or break your application.