Setting up LangSmith monitoring for Autogen AG2 multi-agent conversations

I have built a text processing system with Autogen AG2 that uses multiple agents working together. My setup includes a parser agent that breaks down content and a processor agent that handles the analysis. These agents communicate through GroupChat functionality.

I know AG2 works well with AgentOps for tracking, but I want to use LangSmith for monitoring instead since it fits better with my existing tools.

Here’s my current implementation in the TextProcessor class:

class TextProcessor:
    def __init__(self):
        print("Setting up TextProcessor")
        self.processor = processing_agent.ProcessorAgent().agent
        self.parser = parsing_agent.ParserAgent().agent
        self.coordinator = autogen.UserProxyAgent(
            name="Coordinator", code_execution_config=False
        )
        print("TextProcessor ready with all agents")
    
    def execute(self):
        print("Starting multi-agent conversation")
        self.chat_group = autogen.GroupChat(
            agents=[self.coordinator, self.parser, self.processor],
            messages=[],
            max_round=15,
            speaker_selection_method=self._select_next_speaker,
        )
        self.chat_manager = autogen.GroupChatManager(
            groupchat=self.chat_group, llm_config=config.model_settings
        )

        try:
            self.coordinator.initiate_chat(
                self.chat_manager, message="Process content_id:" + str(self.content_id)
            )
            print("Chat started successfully")
        except Exception as error:
            print(f"Chat failed: {error}")

How do I add LangSmith tracing to track what happens in this AG2 GroupChat? I tried some approaches but since autogen handles the LLM calls internally, my attempts didn’t work. Any examples or documentation would be helpful.

Hit this constantly with multi-agent observability. Monkey patching works but breaks every AG2 update.

I skip intercepting AG2’s internals and use LangSmith’s run tree API to manually trace conversations:

from langsmith import Client
from langsmith.run_trees import RunTree

class TextProcessor:
    def __init__(self):
        self.langsmith_client = Client()
        # your existing setup
    
    def execute(self):
        # Parent trace for whole conversation
        conversation_run = RunTree(
            name="ag2_multi_agent_conversation",
            inputs={"content_id": self.content_id},
            client=self.langsmith_client
        )
        
        # Hook into GroupChat messages
        original_messages = self.chat_group.messages
        
        def track_message_exchange():
            current_messages = self.chat_group.messages
            if len(current_messages) > len(original_messages):
                new_message = current_messages[-1]
                
                agent_run = conversation_run.create_child(
                    name=f"agent_{new_message.get('name', 'unknown')}_turn",
                    inputs={"message": new_message.get('content', '')}
                )
                agent_run.end()
        
        # Start GroupChat normally
        result = self.coordinator.initiate_chat(
            self.chat_manager, 
            message="Process content_id:" + str(self.content_id)
        )
        
        conversation_run.end(outputs={"final_result": str(result)})
        return result

You control exactly what gets tracked, can add custom metrics and tags, and it won’t break with AG2 updates since you’re not touching internals.

Handles agent failures and timeouts way better than callback methods too.

I’ve had good luck with similar setups. Skip wrapping clients and use LangSmith’s callback handlers directly instead. Here’s what works: build a custom LangSmithCallbackHandler that inherits from langchain’s BaseCallbackHandler. Override the on_llm_start and on_llm_end methods, then drop this handler into your agents’ llm_config under ‘callbacks’. This way you’ll catch both LLM calls and the agent-to-agent chatter in GroupChat. I track speaker changes, message content, and timing like this. Quick heads up on your code - add error handling around callback initialization. LangSmith fails silently when the API key’s messed up. Also, set custom tags for each agent type so you can filter traces by parser vs processor in the dashboard. Your AG2 setup stays untouched and you get detailed visibility into the whole multi-agent conversation.

I hit this exact same issue trying to get proper observability into my AG2 workflows. Most solutions either miss the internal GroupChat orchestration or break when AG2 updates.

What fixed it for me was LangSmith’s automatic tracing through environment variables plus monkey patching AG2’s core message handling. I patch the _generate_oai_reply method in ConversableAgent before importing autogen - this wraps it with LangSmith’s trace context and captures every single LLM interaction without messing with your agent configs.

import langsmith
from langsmith import Client

# Patch before importing autogen
original_generate = None

def patched_generate(self, messages, sender, config):
    with langsmith.trace(f"ag2_{self.name}_generation") as trace:
        trace.update(inputs={"sender": sender.name, "message_count": len(messages)})
        result = original_generate(self, messages, sender, config)
        trace.update(outputs={"response": str(result)})
        return result

import autogen
original_generate = autogen.ConversableAgent._generate_oai_reply
autogen.ConversableAgent._generate_oai_reply = patched_generate

This captures everything - GroupChat manager decisions, individual agent responses, and speaker selection logic - all in one clean trace tree. Way better than trying to wrap clients or callbacks that get lost in AG2’s message routing.

Actually, there’s an easier way - just wrap your entire execute method with the langsmith context manager instead of individual llm configs. Use with tracing_v2.trace("ag2_session") as trace: and call initiate_chat inside it. This catches everything automatically without touching agent configs.

Been down this exact path with AG2 and LangSmith integration. The trick is using the @traceable decorator from langsmith on your custom methods and then hooking into AG2’s message flow.

Here’s what worked for me:

from langsmith import traceable
from langsmith.wrappers import wrap_openai
import autogen

class TextProcessor:
    def __init__(self):
        # Wrap your LLM config with LangSmith
        wrapped_config = self._wrap_llm_config(config.model_settings)
        
        self.processor = processing_agent.ProcessorAgent().agent
        self.parser = parsing_agent.ParserAgent().agent
        self.coordinator = autogen.UserProxyAgent(
            name="Coordinator", code_execution_config=False
        )
        
        # Update agents with wrapped config
        self.processor.llm_config = wrapped_config
        self.parser.llm_config = wrapped_config
    
    def _wrap_llm_config(self, config):
        if 'client' in config:
            config['client'] = wrap_openai(config['client'])
        return config
    
    @traceable(name="ag2_group_chat")
    def execute(self):
        self.chat_group = autogen.GroupChat(
            agents=[self.coordinator, self.parser, self.processor],
            messages=[],
            max_round=15,
            speaker_selection_method=self._select_next_speaker,
        )
        
        self.chat_manager = autogen.GroupChatManager(
            groupchat=self.chat_group, 
            llm_config=self._wrap_llm_config(config.model_settings)
        )
        
        return self.coordinator.initiate_chat(
            self.chat_manager, 
            message="Process content_id:" + str(self.content_id)
        )

Wrap the OpenAI client in your llm_config and use traceable decorators on your main methods. This catches most LLM calls that AG2 makes internally.

One gotcha - set your LANGCHAIN_API_KEY and LANGCHAIN_PROJECT environment variables before running. LangSmith will show each agent’s messages and conversation flow in separate traces.

Both approaches work but they’re way more complex than needed. I’ve hit similar monitoring issues and patching AG2 internals always turns into a mess.

I just build the whole thing as a Latenode automation instead. Set up nodes for each agent - parser, processor, coordinator - and handle GroupChat logic with Latenode’s conditional routing.

Best part? Built-in monitoring and logging for every step. No wrapping OpenAI clients or context managers. You see exactly what each agent’s doing, timing, and failures.

Migrated a similar multi-agent system last month. Instead of wrestling with AG2’s internal LLM calls, I rebuilt the agent interactions as a Latenode workflow. Each agent = separate node, GroupChat speaker selection = routing logic.

Keep your existing agent code but orchestrate through Latenode instead of AG2’s GroupChat. Bonus: retry logic, error handling, and detailed logs included.