I’ve built a system that uses a LangChain AI agent for analyzing pandas DataFrames. The agent gets comprehensive instructions through pre-prompts, along with conversation history and table metadata. My main issue is with how the agent processes user text input and converts it into proper pandas queries.
The agent works well overall - it answers data questions and generates plotly charts as instructed in the pre-prompt. But sometimes it misinterprets user requests.
When users search for country data, results are unpredictable. Puerto Rico should match code “PRI” but the agent sometimes searches “PR” instead, finding nothing. Mexico works fine when users type “Mexico” or “MEX” since both appear in different DataFrame columns (country name and country code). But if someone writes “MX” for Mexico, the agent occasionally queries with “MX” and returns empty results with no charts or responses.
Capitalization in user questions also causes problems sometimes, though I added instructions in the pre-prompt to handle this.
I’ve provided detailed metadata for all columns across my 3 datasets in the pre-prompt. The agent uses chat history with memory too. I don’t want to hardcode specific cases for every country or user input variation. The AI temperature is set to 0.
Any suggestions for improving consistency?
Dealt with this exact headache on a customer analytics project last year. Your agent is basically guessing at mappings instead of doing proper lookups.
What worked for me was creating a preprocessing step before the agent even sees the query. I built a simple mapping dictionary that captures all the variations - like “PR”, “PRI”, “Puerto Rico” all pointing to the canonical form.
But here’s the key part - I didn’t hardcode everything. Instead, I used fuzzy string matching with the fuzzywuzzy library to catch variations automatically. Set the threshold around 80-85% similarity.
For your country codes specifically, grab a standard ISO country code dataset and merge it with your existing data. Then your preprocessing can normalize any country input before it hits the agent.
The capitalization thing is easy - just add .lower() or .upper() to your preprocessing pipeline consistently.
One more trick - add a validation step after query generation but before execution. If the pandas query returns empty results, have the agent try fuzzy matching against actual column values before giving up.
This approach kept my agent flexible while fixing the random failure cases. The preprocessing overhead is minimal and saves you from chasing edge cases forever.
I ran into something similar when building a financial data analysis tool. The problem you’re describing usually stems from the agent making assumptions about data relationships without actually checking what exists in your dataframes first. One technique that significantly improved my results was implementing a two-phase approach. Before the agent constructs any pandas query, I have it run a quick exploratory step where it samples the relevant columns to understand the actual data patterns. So for country searches, the agent first checks what country-related values actually exist in your dataframes. Another thing that helped was adding explicit fallback logic in the pre-prompt instructions. Tell the agent that if a direct match fails, it should automatically try partial string matching with pandas .str.contains() before giving up. This catches cases where users input abbreviations that don’t exactly match your data. For the consistency issues, I found that adding more specific examples of query patterns in the prompt worked better than general instructions. Show the agent exactly how to handle country code scenarios with concrete before/after examples. The memory component might actually be working against you here if it’s remembering incorrect query patterns. Consider adding a success/failure flag to your chat history so the agent can learn from its mistakes rather than just accumulating context.
Temperature setting at 0 is good but the real issue is your agent lacks proper data awareness during query construction. I’ve seen similar problems when the LLM doesn’t understand the actual values present in your columns. Consider implementing a column profiling step that feeds the agent not just metadata but actual sample values from each relevant column. For country data, include the top 10-15 most common country names and codes directly in your prompt context. Another approach that helped me was adding a query validation layer. Before executing any pandas operation, check if the search terms actually exist in the target columns using something like df[column].str.contains(search_term, case=False, na=False).any(). If false, trigger a fallback search across related columns. You might also want to implement a “did you mean” functionality. When queries return empty results, have the agent suggest similar matches from the actual column values using string similarity. This gives users feedback and helps the agent learn better patterns. The key is making your agent more aware of what’s actually in your data rather than just relying on general knowledge about country codes.
hmm sounds like the agent needs better context about your data structure. try adding examples of valid country codes directly in your prompt - show it that “PRI” not “PR” is correct for puerto rico. also maybe use a lookup table approach where you pre-process user input through pandas .isin() or .str.contains() with case=False before the agent generates queries.