Retrieving generated Python code from langchain agent before hitting OpenAI token limits

Tom_89Paint · August 22, 2025, 8:17pm

I’m working with a langchain agent that processes dataframes, but I’m running into token limit issues with OpenAI models. Here’s what happens:

The agent generates valid Python code like this:

# Agent's thought process and generated code
Thought: I should filter this dataset based on the given conditions
Action: python_repl_ast  
Action Input: data[(data['REGION'] == 'Europe') & (data['CATEGORY'] == 'Electronics') & (data['YEAR'] == 2022)]

The problem is that when this code executes, the resulting dataframe is too large and exceeds the max_tokens parameter for my OpenAI model. When this happens, I lose access to the Python code that was generated.

I need a way to capture the generated Python code before the token limit gets hit. I’ve tried using callbacks and checking intermediate steps, but neither approach works because I have to call agent.run(my_query) to get the results, and that’s when the token breach occurs.

Here’s my setup code:

from langchain.llms import OpenAI
from langchain.agents import create_pandas_dataframe_agent
import pandas as pd

# Load dataset
data = pd.read_csv("my_dataset.csv")
llm_agent = create_pandas_dataframe_agent(
    OpenAI(temperature=0, model_kwargs={"model": 'text-davinci-003'}), 
    data, 
    verbose=True, 
    max_iterations=2
)

user_query = "Find all records matching specific criteria"
result = llm_agent.run(user_query)  # This is where token limit gets exceeded

How can I extract the generated Python code without triggering the token limit error?

opalEcho · August 30, 2025, 2:56pm

I’ve been fighting this same problem for months. Langchain agents don’t give you clean separation between generating code and running it.

You need a workflow that handles this step by step instead of letting langchain control everything. I switched to Latenode and it’s been huge.

Here’s how I set it up:

First node calls OpenAI to generate Python code
Second node captures and stores that code
Third node executes it with proper error handling
Hit token limits? You still have the code from step 2

I built something similar for our data processing pipeline. Splitting AI reasoning from code execution means you don’t lose work when things break.

You can also check dataframe sizes before sending results back to the AI. Too big? Just send a summary or first few rows.

The visual builder makes it way easier to see what’s happening at each step vs debugging langchain callbacks.

John_Clever · August 29, 2025, 9:53am

This happens because langchain generates and runs code in one go. I’ve hit this same issue in production - monkey-patching the agent’s tool execution works best. Override the _run method in PythonREPLTool. Make a custom tool that inherits from it and store the generated code in a class variable before it runs. Even if execution crashes from token limits, you’ll still have the Python code. You can also wrap your agent execution in try-catch and pull from the agent’s memory buffer to grab the last action. The agent keeps its reasoning chain in memory, so you can access the Action Input field when the token limit hits. Quick fix: try a smaller context window or truncate results at the pandas level with dataframe slicing before sending back to the LLM.

Grace_31Dance · August 29, 2025, 4:26am

Had this exact problem on a client project with huge financial datasets. Langchain’s pandas agent treats code generation and results as one unit - when the output dataframe exceeds token limits, it rolls everything back.

Fixed it by writing a custom callback handler that grabs the agent’s reasoning before execution finishes. Subclass BaseCallbackHandler and override on_tool_start to capture the action input (your generated code) when the python tool starts.

I also set a row limit in the agent’s system prompt to stop massive outputs upfront. Added “always limit dataframe results to 50 rows with .head() unless asked for more” - works great. You get useful results without hitting token limits.

The callback method beats monkey-patching tools since it won’t break on langchain updates. Keeps the original workflow but captures code when things fail.

Alex_Thunder · August 28, 2025, 7:11pm

had the same prob last week. just set return_intermediate_steps=True when u run the agent - that way u can pull the code from agent.intermediate_steps if it fails. also try shrinking dataset with .head(100) or sampling b4 feeding it to the agent. helps avoid limits.

zoea · August 27, 2025, 3:30am

Hit this exact problem building an analytics platform. Langchain bundles code generation and execution way too tightly.

What fixed it: custom chain types that split reasoning from execution. Skip the standard pandas agent and separate these processes.

Here’s what worked:

from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate

# Step 1: Generate code only
code_prompt = PromptTemplate(
    input_variables=["query", "columns"],
    template="Generate Python pandas code for: {query}. Available columns: {columns}"
)

code_chain = LLMChain(llm=your_llm, prompt=code_prompt)
generated_code = code_chain.run(query=user_query, columns=data.columns.tolist())

# Step 2: Store the code before execution
print(f"Generated code: {generated_code}")

# Step 3: Execute with size checks
try:
    result = eval(generated_code)
    if len(result) > 1000:  # or whatever limit makes sense
        result = result.head(100)
except Exception as e:
    print(f"Execution failed but we have the code: {generated_code}")

You’ll always capture the generated code before token limits mess things up. Used this pattern across multiple projects - it’s bulletproof.

This video covers different chain approaches for token limits:

Key insight: treat code generation as separate from execution. Once you split them, token limits become manageable.