How to examine what happens behind the scenes in LangGraph

I’m new to working with LangGraph and LangChain frameworks. I want to dig deeper into the internal mechanics of how these tools work.

Specifically, I’m curious about message handling. What’s the difference between sending two separate Human Messages one after another versus combining them into one longer Human Message? I’d love to see the actual formatted string that gets created before it goes through tokenization. This would help me understand how the system tracks state and manages conversation flow.

Even just seeing the API calls that get made would be helpful for learning. Does anyone know good ways to inspect this kind of low-level activity?

I’ve been using LangSmith which shows some useful details about prompts and token usage at different stages, but I want to understand even more of what’s happening internally.

When working with LangGraph, understanding the distinction between separate and combined Human Messages is crucial. Separate messages typically originate as distinct entries, which can affect how the conversation buffer is constructed and tracked. In my experience, employing Python’s logging at DEBUG level for the langchain and langgraph modules has proven effective for revealing formatted strings just before tokenization. Additionally, modifying the model’s invoke method can allow you to capture the exact input before processing. I have also found the built-in callbacks system useful; crafting a custom handler lets you monitor formatted prompts at each stage, providing insights into state transitions within the conversation.

set LANGCHAIN_TRACING_V2=true and LANGCHAIN_DEBUG=true in your env variables. it’ll give u way more console output, incl. the actual prompts being sent. also try .astream_events() on ur graph - it shows real-time exec steps so you can see how messages get processed internally.

I monkey patch the tokenizer calls when I need to see what’s happening under the hood. Just wrap the tokenize method and print the input string before it processes.

For message handling differences, check the state dict directly. Two separate Human Messages get stored as individual entries in conversation history. Combined messages show up as one entry with longer content.

Add this to your code:

import functools

def debug_wrapper(func):
    @functools.wraps(func)
    def wrapper(*args, **kwargs):
        print(f"Input to {func.__name__}: {args[0] if args else 'No args'}")
        return func(*args, **kwargs)
    return wrapper

# Then patch whatever method you want to inspect
original_method = your_model.some_method
your_model.some_method = debug_wrapper(original_method)

This saves tons of debugging time. You see the raw formatted strings without external tools.

For API calls, I use mitmproxy locally. It intercepts all HTTP traffic so you can see actual requests going to model providers.