Setting up LangSmith monitoring for Autogen AG2 multi-agent conversations

I’m building a file processing system with Autogen AG2 that uses multiple agents working together. My setup includes a parser agent that breaks down files and a processor agent that handles the analysis. They communicate through AG2’s GroupChat feature.

I know AG2 works well with AgentOps for tracking, but I need to use LangSmith since it’s what I use for my other projects.

Here’s my current implementation in the FileProcessor class:

class FileProcessor:
    def __init__(self):
        print("Setting up FileProcessor")
        self.processor = processing_agent.ProcessorAgent().agent
        self.parser = parsing_agent.ParserAgent().agent
        self.coordinator = autogen.UserProxyAgent(
            name="Coordinator", code_execution_config=False
        )
        print("FileProcessor ready with all agents")
    
    def execute(self):
        print("Starting agent conversation")
        self.chat_group = autogen.GroupChat(
            agents=[self.coordinator, self.parser, self.processor],
            messages=[],
            max_round=15,
            speaker_selection_method=self._select_next_speaker,
        )
        self.chat_manager = autogen.GroupChatManager(
            groupchat=self.chat_group, llm_config=config.model_settings
        )

        try:
            self.coordinator.initiate_chat(
                self.chat_manager, message="Process file_id: " + str(self.file_id)
            )
            print("Chat started successfully")
        except Exception as error:
            print(f"Chat failed: {error}")

How do I add LangSmith tracing to track what happens in this multi-agent setup? Since autogen handles the LLM calls internally, I’m not sure where to hook in the monitoring.

Any examples or docs that show this integration would be helpful.

LangSmith integrates pretty smoothly with AG2 using environment variables and the trace context manager. Hit the same issue last month building a multi-agent doc pipeline.

Easiest fix: wrap your entire execute method with LangSmith’s context tracing. Set LANGCHAIN_TRACING_V2=true, add your LANGSMITH_API_KEY, then use the trace decorator around your initiate_chat call.

Here’s the trick - AG2’s LLM calls automatically pick up the trace context when you configure it at the process level. Don’t mess with model configs or agent constructors.

I used @traceable(name="multi_agent_processing") on my FileProcessor.execute method and got full visibility into agent conversations. Each agent’s LLM calls show up as child spans with proper attribution and timing.

Watch out for version compatibility - make sure your AG2 version plays nice with LangSmith’s expected OpenAI client. Had to bump up to AG2 0.2.x for clean trace propagation.

Conversation threading is great for debugging speaker selection and catching when agents wander off during long processing chains.

the callback approach works but gets messy with groupchat’s message routing. I’ve had better luck using langsmith’s run context directly in each agent instead of at the chat manager level. wrap each agent’s llm_config separately - this way you can see which agent made which call. without this, everything shows up as generic “autogen” traces. set up your langsmith project before you initialize agents or you’ll get silent failures that are a pain to debug.

Been down this path before with multi-agent systems. The trick is intercepting AG2’s LLM calls before they happen, not after.

Wrap your model config with LangSmith’s trace decorator. In your config.model_settings, add the LangSmith client as a callback handler. This catches every LLM interaction from all agents automatically.

from langsmith import Client, traceable

@traceable
class FileProcessor:
    def __init__(self):
        langsmith_client = Client()
        
        # Modify your model_settings to include LangSmith callback
        enhanced_config = {
            **config.model_settings,
            "callbacks": [langsmith_client.get_callback_handler()]
        }
        
        # Use enhanced config for chat manager
        self.chat_manager = autogen.GroupChatManager(
            groupchat=self.chat_group, 
            llm_config=enhanced_config
        )

This captures the entire conversation thread with proper agent attribution. Each agent’s messages get tagged with metadata so you can trace decision paths.

I built something similar last year for a document processing pipeline. The visibility into agent interactions was game changing for debugging weird edge cases.

Check this walkthrough for the detailed setup:

The callback method scales better than manual tracing because AG2 handles all the plumbing. You get conversation flow, token usage, and timing data without touching individual agent code.

The LangSmith integration depends on what you’re after - full conversation tracing or just tracking individual LLM calls. I ended up creating a custom wrapper around the AG2 agents instead of messing with the model config. I subclassed the AG2 agents and overrode their message handling methods. Inside those overrides, I manually created LangSmith traces for each agent interaction. This way you get granular control over what’s logged and how conversations show up in LangSmith. The trick is treating each agent turn as a separate trace span within a bigger conversation trace. Parser agent processes a file? That’s one span. Processor agent analyzes results? Another span under the same parent trace. This captures agent-to-agent communication patterns that pure LLM callbacks miss. You see what each agent said AND the decision logic for speaker selection and conversation flow. Yeah, there’s more boilerplate code, but the trace data is way richer. Conversation context flows naturally through LangSmith’s hierarchy, making multi-agent debugging much easier.

Direct AG2-LangSmith integration is a pain. You end up monkey patching internal LLM calls or wrapping every agent individually. Trust me - it’s a maintenance nightmare.

Skip fighting the framework. Route everything through a coordination layer that handles both orchestration and monitoring.

Build a workflow where each agent interaction is a discrete step. Parser agent = one node. Processor agent = another node. Break down GroupChat coordination into trackable stages.

Now you see the entire multi-agent conversation flow without hacking AG2’s internals. Every agent call gets traced with context - which agent’s talking, what data they’re processing, how they respond.

Bonus: add retry logic, error handling, even A/B test different agent configs without touching your core AG2 code.

I’ve used this pattern before. Saves massive debugging time when agents go rogue. Clean logs show exactly where conversations derail.

Latenode handles this orchestration well and has better built-in monitoring than forced LangSmith integration.