I’m utilizing a LangChain ReAct agent to handle CSV files and generate DataFrames. However, when I execute agent_executor.invoke(), it only returns a summary dictionary rather than the actual DataFrame object.
My intention is to make the resulting DataFrame accessible as a variable within my main Python script. Below is my current implementation:
import pandas as pd
from langchain.agents import AgentExecutor, create_react_agent, Tool
from langchain.tools import tool
from langchain_community.tools import ShellTool
from langchain_openai import ChatOpenAI
from langchain_experimental.utilities import PythonREPL
from langchain import hub
@tool
def handle_csv_file(file_path: str):
"""Read a CSV file and return a formatted DataFrame."""
df = pd.read_csv(file_path, header=1, usecols=['content']).reset_index(drop=True)
df = df.select_dtypes(include=['object'])
return df
llm_model = ChatOpenAI(model='gpt-4o-mini')
python_executor = PythonREPL()
shell_tool_instance = ShellTool()
repl_tool_instance = Tool(
name="python_executor",
description="Run Python commands in a REPL environment.",
func=python_executor.run,
)
tools_available = [handle_csv_file, repl_tool_instance, shell_tool_instance]
reactive_prompt = hub.pull("hwchase17/react")
react_agent = create_react_agent(llm=llm_model, tools=tools_available, prompt=reactive_prompt)
executor_instance = AgentExecutor(agent=react_agent, tools=tools_available, verbose=True)
file_path = 'data.csv'
instruction = f"Read the CSV file located at {file_path} and save the result to a variable named df."
output = executor_instance.invoke({"input": instruction})
print("Output from agent:", output)
Here’s what occurs:
The agent effectively processes the CSV file, as seen in the verbose output.
The invoke() function provides only a summary dictionary instead of the DataFrame object I need.
Attempts made include:
Leveraging global variables to store the DataFrame
Returning the DataFrame in JSON format from my processing function
Altering the return mechanism
Yet, the agent consistently returns only a summary.
Question: How can I successfully retrieve the actual DataFrame from the LangChain agent’s execution? I need the raw DataFrame, not merely the execution summary.
Had the same issue last month! Don’t return the dataframe directly - pickle it to a temp file and return the file path instead. Use df.to_pickle('temp_df.pkl') in your tool, then pd.read_pickle() after the agent finishes. Way better than global vars.
I’ve hit this exact problem countless times. LangChain agents treat everything as text conversations, so they can’t pass DataFrame objects around.
Automation fixes this. Skip wrestling with LangChain’s limitations and build a workflow that handles DataFrames natively.
Last year I built something similar for processing CSV files and passing results between steps. Use a workflow automation platform that actually understands Python objects.
With the right setup:
Read your CSV in one step
Process with pandas in another
Pass the actual DataFrame object (not text) to next steps
Use AI models when needed, keep data processing clean
No pickle files, global variables, or hacky workarounds. Your DataFrame stays a DataFrame the whole time.
Workflows are way easier to debug too - you see exactly what happens at each step, unlike the black box agent approach.
Check out Latenode if you want to see this in action. It handles Python objects properly and lets you build data processing workflows without fighting the framework.
Yeah, this is super common. LangChain agents only return text responses - they can’t pass complex objects like DataFrames through the invoke method.
I’ve hit this same wall multiple times. The cleanest fix I’ve found is a context manager that handles data storage. Here’s what works:
import pandas as pd
from contextlib import contextmanager
class AgentDataStore:
def __init__(self):
self.data = {}
def store(self, key, value):
self.data[key] = value
def retrieve(self, key):
return self.data.get(key)
# Create a global store instance
data_store = AgentDataStore()
@tool
def handle_csv_file(file_path: str):
"""Read a CSV file and return a formatted DataFrame."""
df = pd.read_csv(file_path, header=1, usecols=['content']).reset_index(drop=True)
df = df.select_dtypes(include=['object'])
# Store the DataFrame
data_store.store('processed_df', df)
return f"CSV processed successfully. Shape: {df.shape}"
# After your agent execution
output = executor_instance.invoke({"input": instruction})
df = data_store.retrieve('processed_df')
This keeps your data separate from the agent’s conversational flow but still accessible after execution. I’ve used this pattern in several projects - it’s rock solid.
If you’re working with CSV files regularly, this video covers some solid pandas techniques that might help:
Your problem comes from how LangChain agents work - they’re built for conversations, not direct data processing. The invoke() method always returns a dictionary with the agent’s answer as text. Here’s a workaround that’s worked for me: have your tool save the DataFrame somewhere your main script can grab it. I use a class attribute or module-level variable. This pattern works: class DataFrameHandler: stored_df = None @staticmethod@tool def process_csv(file_path: str): df = pd.read_csv(file_path, header=1, usecols=[‘content’]).reset_index(drop=True) df = df.select_dtypes(include=[‘object’]) DataFrameHandler.stored_df = df return f"DataFrame processed with shape {df.shape}" # After agent execution output = executor_instance.invoke({“input”: instruction}) result_df = DataFrameHandler.stored_df. This way you keep the agent’s conversational output separate from your actual data processing, and you can still get the DataFrame object you need.
This happens because LangChain agents can only serialize text, not DataFrame objects. I ran into the same thing and found a workaround that actually works pretty well. Don’t try returning the DataFrame directly - it won’t work. Instead, use shared memory. Set up a simple session store before you call the agent: session_store = {} at the module level. Then modify your tool to dump the DataFrame there: session_store['dataframe'] = df and just return a success message to the agent. Once the agent finishes, grab your DataFrame from session_store['dataframe']. It’s a bit of a hack, but it works with LangChain’s text-only system while still giving you the raw data you need.