I’m working with a langchain agent that processes dataframes, but I’m running into token limit issues with OpenAI models. Here’s what happens:
The agent generates valid Python code like this:
# Agent's thought process and generated code
Thought: I should filter this dataset based on the given conditions
Action: python_repl_ast
Action Input: data[(data['REGION'] == 'Europe') & (data['CATEGORY'] == 'Electronics') & (data['YEAR'] == 2022)]
The problem is that when this code executes, the resulting dataframe is too large and exceeds the max_tokens parameter for my OpenAI model. When this happens, I lose access to the Python code that was generated.
I need a way to capture the generated Python code before the token limit gets hit. I’ve tried using callbacks and checking intermediate steps, but neither approach works because I have to call agent.run(my_query) to get the results, and that’s when the token breach occurs.
Here’s my setup code:
from langchain.llms import OpenAI
from langchain.agents import create_pandas_dataframe_agent
import pandas as pd
# Load dataset
data = pd.read_csv("my_dataset.csv")
llm_agent = create_pandas_dataframe_agent(
OpenAI(temperature=0, model_kwargs={"model": 'text-davinci-003'}),
data,
verbose=True,
max_iterations=2
)
user_query = "Find all records matching specific criteria"
result = llm_agent.run(user_query) # This is where token limit gets exceeded
How can I extract the generated Python code without triggering the token limit error?
This happens because langchain generates and runs code in one go. I’ve hit this same issue in production - monkey-patching the agent’s tool execution works best. Override the _run method in PythonREPLTool. Make a custom tool that inherits from it and store the generated code in a class variable before it runs. Even if execution crashes from token limits, you’ll still have the Python code. You can also wrap your agent execution in try-catch and pull from the agent’s memory buffer to grab the last action. The agent keeps its reasoning chain in memory, so you can access the Action Input field when the token limit hits. Quick fix: try a smaller context window or truncate results at the pandas level with dataframe slicing before sending back to the LLM.
Had this exact problem on a client project with huge financial datasets. Langchain’s pandas agent treats code generation and results as one unit - when the output dataframe exceeds token limits, it rolls everything back.
Fixed it by writing a custom callback handler that grabs the agent’s reasoning before execution finishes. Subclass BaseCallbackHandler and override on_tool_start to capture the action input (your generated code) when the python tool starts.
I also set a row limit in the agent’s system prompt to stop massive outputs upfront. Added “always limit dataframe results to 50 rows with .head() unless asked for more” - works great. You get useful results without hitting token limits.
The callback method beats monkey-patching tools since it won’t break on langchain updates. Keeps the original workflow but captures code when things fail.
had the same prob last week. just set return_intermediate_steps=True when u run the agent - that way u can pull the code from agent.intermediate_steps if it fails. also try shrinking dataset with .head(100) or sampling b4 feeding it to the agent. helps avoid limits.
Hit this exact problem building an analytics platform. Langchain bundles code generation and execution way too tightly.
What fixed it: custom chain types that split reasoning from execution. Skip the standard pandas agent and separate these processes.
Here’s what worked:
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
# Step 1: Generate code only
code_prompt = PromptTemplate(
input_variables=["query", "columns"],
template="Generate Python pandas code for: {query}. Available columns: {columns}"
)
code_chain = LLMChain(llm=your_llm, prompt=code_prompt)
generated_code = code_chain.run(query=user_query, columns=data.columns.tolist())
# Step 2: Store the code before execution
print(f"Generated code: {generated_code}")
# Step 3: Execute with size checks
try:
result = eval(generated_code)
if len(result) > 1000: # or whatever limit makes sense
result = result.head(100)
except Exception as e:
print(f"Execution failed but we have the code: {generated_code}")
You’ll always capture the generated code before token limits mess things up. Used this pattern across multiple projects - it’s bulletproof.
This video covers different chain approaches for token limits:
Key insight: treat code generation as separate from execution. Once you split them, token limits become manageable.