I’m building an application that uses a LangChain AI agent for analyzing pandas DataFrames. The agent gets comprehensive instructions through the system prompt, along with conversation history and table metadata. My main issue is with how the agent interprets user text input and converts it into proper pandas operations.
The agent works well most of the time for answering data questions and generating plotly charts as specified in the prompt. But I keep running into problems with inconsistent behavior.
When users ask about specific countries, the results are unpredictable. Sometimes when someone mentions Puerto Rico, the agent correctly looks for “PRI” but other times it searches for “PR” and finds nothing. With Mexico, the agent handles “Mexico” and “MEX” fine since both appear in different DataFrame columns. But if someone types “MX” the agent might query using that exact term and return empty results with no charts or responses.
Capitalization in user queries also causes issues despite adding instructions to handle this in the prompt. I’ve provided detailed metadata for all columns across my three datasets and enabled conversation memory. The temperature is set to 0 for consistency.
I want to avoid hardcoding specific mappings for every possible country variation. What approaches can help make the agent more reliable at interpreting user input?