I’m building an application that uses a LangChain AI agent for analyzing pandas DataFrames. The agent gets comprehensive instructions through the system prompt, along with conversation history and table schema information.
My main issue is with string matching inconsistencies when users search for specific entries. The agent sometimes fails to find the right data because it doesn’t handle alternative naming conventions properly.
For example, when someone looks up Puerto Rico, the system should find “PRI” in the dataset, but sometimes it searches for “PR” instead and comes back empty. Similar problems happen with Mexico - the dataframe contains both “Mexico” and “MEX” in different columns, so those work fine. But if someone types “MX” the agent might not find anything.
Case sensitivity also causes problems occasionally, even though I added instructions about this in the system prompt.
I’ve provided detailed column descriptions for all three datasets in my prompt. The agent also maintains conversation context. I don’t want to hardcode every possible variation for each country or data entry. The model temperature is set to 0.
Any suggestions for making the search more robust?