I’ve been working on an AI agent project and I’m having second thoughts about using RAG (Retrieval-Augmented Generation). At first it seemed like a good idea but now I’m not so sure.
The main issues I’m running into are:
Chunking is trickier than I thought. It’s hard to get the right balance of context without including too much fluff.
The retrieval part is giving me a headache. I keep tweaking it but can’t seem to get consistently relevant results.
Even when I do get good retrieval, the LLM sometimes still makes stuff up.
Has anyone tried using a SQL database with a Text-to-SQL agent instead? I’m wondering if that might be a simpler approach. You could organize the data neatly in tables and then just query what you need.
I’m especially curious about tools for data ingestion. How do you handle stuff like PDFs or messy HTML?
Any thoughts on which approach is better for building reliable AI agents? RAG or SQL + Text-to-SQL? I’d love to hear about your experiences!
tried both rag and sql+text-to-sql for agents. rag works well if your info is unstrctured; sql is better with neatly structured data. for pdfs and html, tools like apache tika and beautiful soup help a lot. choose what fits ur data best
I’ve been down this road before, and I feel your pain with RAG. It can be a real headache to get right. In my experience, SQL + Text-to-SQL has been more reliable for certain types of projects, especially when dealing with structured data.
For data ingestion, I’ve had good results using tools like Textract for PDFs and Scrapy for web scraping. They’re not perfect, but they’ve saved me a ton of time compared to trying to reinvent the wheel.
One thing to consider is the long-term maintainability of your system. RAG can become unwieldy as your dataset grows, while a well-designed SQL schema can scale more gracefully.
That said, there’s no one-size-fits-all solution. It really depends on the nature of your data and the specific requirements of your AI agent. Have you considered a hybrid approach? You could use SQL for the core structured data and supplement with RAG for handling more free-form queries.
I’ve explored both approaches in my projects. RAG can be powerful, but you’re right about the challenges. For structured data, SQL + Text-to-SQL often proves more straightforward and reliable. It’s easier to maintain data integrity and perform complex queries.
For ingesting unstructured data like PDFs or HTML, I’ve had success using Apache Tika for extraction, then implementing a custom pipeline to clean and structure the data before insertion into SQL tables. This approach requires more upfront work but tends to yield more consistent results in my experience.
Ultimately, the best choice depends on your specific use case and data types. If you’re dealing primarily with well-defined, structured information, SQL might be the way to go. It offers better control and can be more efficient for certain types of queries.