Hey everyone! I’m working on setting up a RAG system and want to make sure I’m using the right tools for each part. Looking for recommendations on what’s working well right now.
For document chunking - what methods or libraries are you using to split up your content? Any particular strategies that work better than others?
Embedding models - which ones are giving you the best results these days? Are there specific models that handle different types of content better?
Vector storage and search - what databases or search systems are you finding most reliable for storing and querying embeddings?
Reranking approaches - are you using any models or techniques to improve the relevance of retrieved results?
Integration tools - any platforms or frameworks that help connect all these pieces smoothly?
Would really appreciate hearing about your experiences with different tools and what combinations have worked best for you. Thanks for any insights you can share!
Been running a RAG pipeline in production for 8 months. Switched from fixed-size splits to semantic chunking - way better context preservation and cut down retrieval noise big time. Using text-embedding-ada-002 from OpenAI, super reliable. Testing BGE models now to save on costs. Pinecone handles our vector storage - scales well and query performance stays consistent. Added cross-encoder reranking with ms-marco-MiniLM-L-12-v2, bumped answer relevance 15-20%. Biggest lesson: chunking strategy is everything. Wasted weeks tweaking embeddings when the real problem was crappy document segmentation.
Running RAG for legal docs here. For chunking, I’d go with LangChain’s RecursiveCharacterTextSplitter with overlap - it handles structured documents really well. BGE-large-en consistently beats smaller models on our domain stuff, and it’s worth the extra compute cost. We switched from Weaviate to Qdrant last quarter and got better performance with filtered searches. Reranking was a game changer - we’re using Cohere’s rerank API. It’s expensive but cuts irrelevant results by 40%. Pro tip I learned the hard way: test your chunk sizes against your actual queries, not generic benchmarks. Our sweet spot ended up being 800 tokens with 100 overlap, which is way different from typical recommendations.
i’ve tried sentence-transformers, and all-MiniLM-L6-v2 is solid for many tasks. ChromaDB is good for vector storage too, super simple to use. im still figuring out reranking tho. what are you guys using?