How to debug and monitor LlamaIndex agent execution process

I need help with tracking the execution flow of my LlamaIndex agent. I’m working with version 0.12.31 and trying to monitor what happens during agent operations.

First approach I tried was setting verbose mode:

from llama_index.core.agent.workflow import FunctionAgent
my_agent = FunctionAgent(
    tools=[my_tool],
    llm=language_model,
    system_prompt="You are an AI assistant",
    verbose=True,
    allow_parallel_tool_calls=True,
)

This didn’t show any debug output. Then I attempted using callback handlers:

from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler

debug_handler = LlamaDebugHandler()
callback_mgr = CallbackManager([debug_handler])

my_agent = FunctionAgent(
    tools=[my_tool],
    llm=language_model,
    system_prompt="You are a helpful AI that assists with questions",
    callback_manager=callback_mgr,
    allow_parallel_tool_calls=True,
)

Unfortunately this method also failed to provide the visibility I need. What’s the correct way to see the internal steps and decision making process of the agent?

try enabling logging for better insights. i struggled with this too on 0.12.x. just import logging and set it with logging.basicConfig(level=logging.DEBUG) before initializing your agent. also, double-check you’re using the correct agent class, as workflow agents have different debug outputs.

Had the same problem recently. Event-based debugging beats the built-in handlers every time. Skip the callbacks and hook directly into the agent’s event system instead.

You need to tap into the internal event stream that workflows create. Build a custom event listener that grabs workflow steps, tool picks, and reasoning processes as they happen. This shows you exactly how the agent handles each request.

Also try trace-level logging for workflow components specifically. Regular debug logs miss the workflow guts, but trace logging catches the state changes and decision points you actually need to understand what’s going on.

Don’t forget to check the agent’s planning phase output too - that’s where most reasoning happens before it runs any tools. The verbose flags usually skip this planning step.

I’ve had the same monitoring headaches at work. LlamaIndex debugging is a pain when you need to see what’s actually happening in agent workflows.

Here’s what fixed it for me: I built external monitoring that runs outside the agent. It grabs all interactions, tool calls, and decision points as they happen.

Built-in verbose modes and callback handlers miss too much stuff. So I route everything through an automation layer that logs every operation. Now I can see exactly what the agent’s thinking and doing.

The trick is making sure your monitoring handles async operations and parallel tool calls without dropping events. You’ll get full execution traces with timing data and error states.

I used Latenode for this since it handles the complex orchestration you need for proper agent monitoring. It captures all the data streams and shows them in a way that’s actually useful for debugging.

Here’s how to build this kind of monitoring automation: https://latenode.com