Incorrect output when implementing memory functionality with langchain CSV agent

Problem Description

I’m having trouble with a langchain agent that processes CSV data with memory capabilities. The agent correctly remembers context from previous queries but returns wrong data in follow-up questions.

def process_csv_data(request_json: str):
    '''
    Function to handle CSV data extraction tasks.
    
    Input format: JSON string containing query and filename
        { "query":"<your_question>", "filename":"<csv_file>" }

    Example usage:
        { "query":"Show me the oldest person in data.csv", "filename":"data.csv" }

    Parameters:
        request_json (str): JSON formatted request string

    Output:
        Extracted information from the CSV file
    '''
    parsed_request = json.loads(request_json)
    user_query = parsed_request["query"]
    csv_filename = parsed_request["filename"]
    data_agent = create_csv_agent(llm=OpenAI(), path=file_location, verbose=True)
    return data_agent(user_query)

input_template = '{"query":"<your_question>","filename":"<csv_file>"}'
tools_description = f'Tool for CSV file operations. Required input format: {input_template}'

csv_processing_tool = Tool(
    name="process_csv_data",
    func=process_csv_data,
    description=tools_description,
    verbose=True,
)
from langchain.agents import ZeroShotAgent
from langchain.memory import ConversationBufferMemory

tools_list = [csv_processing_tool]

system_prefix = """Engage in conversation with user while maintaining context from previous interactions. Always consider chat history when answering new questions. If a user asks about specific data mentioned earlier, reference that information in your response. Available tools:"""

conversation_suffix = """Start now!

{chat_history}
User Input: {input}
{agent_scratchpad}"""

agent_prompt = ZeroShotAgent.create_prompt(
    tools=tools_list,
    prefix=system_prefix,
    suffix=conversation_suffix,
    input_variables=["input", "chat_history", "agent_scratchpad"]
)

conversation_memory = ConversationBufferWindowMemory(
    memory_key='chat_history',
    k=5,
    return_messages=True
)

from langchain.chains import LLMChain
from langchain.agents import AgentExecutor

llm_instance = LLMChain(llm=OpenAI(temperature=0), prompt=agent_prompt)
zero_shot_agent = ZeroShotAgent(llm_chain=llm_instance, tools=tools_list, verbose=True)
executor_chain = AgentExecutor.from_agent_and_tools(
    agent=zero_shot_agent, 
    tools=tools_list, 
    verbose=True, 
    memory=conversation_memory
)

First I query for the longest name:

request_data = {"input": {"query": "find the longest name", "filename": "people.csv"}}
json_request = json.dumps(request_data)
response = executor_chain(json_request)

This correctly returns “Johnson” as the longest name.

Then I ask a follow-up question:

follow_up = {"input": {"query": "what is this person's birth date?", "filename": "people.csv"}}
follow_up_json = json.dumps(follow_up)
result = executor_chain(follow_up_json)

The agent recognizes I’m asking about Johnson’s birth date in the first observation, but then returns the birth date from the first row of the dataset instead of Johnson’s actual birth date.

How can I fix this memory issue so the agent properly connects previous context with new data queries? I’m using GPT-3.5-turbo and have tried different prompt configurations without success.

Your memory lives in the executor chain, but the CSV agent is stateless. Every time you call process_csv_data, you’re spinning up a fresh CSV agent that doesn’t know Johnson is the longest name.

I hit this same wall building a financial report analyzer. Fixed it by completely restructuring the flow. Instead of treating the CSV agent as a repeated tool call, I moved CSV processing into the main conversation.

Here’s what worked: load your CSV data once upfront and store the results in variables that stick around between queries. When someone asks for “the longest name”, grab that info and explicitly store it in your conversation context.

Your setup has an architectural problem - two separate memory systems that can’t talk to each other. The conversation memory knows Johnson, but the CSV agent can’t access that memory.

Alternatively, modify your tool description to tell the agent to include previous query context when formatting JSON input for your CSV tool. Make the agent carry forward relevant details instead of expecting the CSV agent to magically remember prior results.

Your CSV processing tool can’t see what your main agent remembers. When the executor chain remembers ‘Johnson’ from the first query, that info gets stuck in conversation memory and never makes it to the CSV agent handling the follow-up.

I hit this exact problem building a document analysis system last year. Fix it by changing how you invoke tools, not the CSV agent itself. Your main agent needs to pull relevant context from chat history and stuff it directly into the tool call.

Update your system prefix to tell the agent: ‘When using tools for follow-up questions, include relevant information from previous queries in your tool input.’ Now when someone asks about ‘this person’s birth date’, the agent builds JSON like ‘find birth date for Johnson’ instead of the useless ‘this person’.

Your memory setup is fine - you just need better prompt engineering to bridge that gap between memory and tool execution.

Your CSV agent creates a fresh context every time it runs, even though your main agent remembers the conversation.

I hit this exact problem last year building a data analysis bot. The CSV agent doesn’t know about Johnson from before because it’s a separate execution.

Here’s what worked for me - modify your process function to inject context:

def process_csv_data(request_json: str, context_info: str = ""):
    parsed_request = json.loads(request_json)
    user_query = parsed_request["query"]
    csv_filename = parsed_request["filename"]
    
    # Add context to the query
    if context_info:
        enhanced_query = f"{context_info} {user_query}"
    else:
        enhanced_query = user_query
        
    data_agent = create_csv_agent(llm=OpenAI(), path=file_location, verbose=True)
    return data_agent(enhanced_query)

Then update your agent prompt to pass relevant context from chat history to the tool.

Alternatively, create the CSV agent once at startup and reuse it, but that gets tricky with file switching.

Make sure your CSV queries include enough context so the agent doesn’t default to “first row” behavior when it can’t find specific matches.

I’ve hit this exact memory issue tons of times. You’re spinning up a fresh CSV agent for every query, so it has zero clue about what happened before.

Stop relying on separate tool runs and start maintaining state between queries. I used to try context injection hacks too - they’re messy and break constantly.

Best fix I’ve found is using Latenode for this workflow. Set up a scenario that processes your CSV and keeps conversation state alive between queries. It links your LLM calls with actual data persistence.

Latenode stores results from your first query (“Johnson is the longest name”) and auto-references that data in follow-ups. No more rebuilding agents or manual context injection.

I built something similar for our customer analysis pipeline. First query grabs relevant records, stores them in workflow state, then later queries work off that stored context instead of hitting the raw CSV again.

It fixes the memory problem at workflow level instead of code patches. Way more reliable than crossing your fingers that the LLM connects everything right.