I’m working on comparing two different agent setups. One uses a single agent approach and the other is a multi-agent system built with LangGraph. I added proper logging to both so I can track everything in LangSmith.
The issue I’m facing is with the multi-agent version. When I look at the traces in the LangSmith web interface, everything appears correctly and I can see all the nodes working as expected. However, when I try to programmatically access the Run object during evaluation, I hit a wall.
I can successfully navigate to my main nodes like Planner and ExecutionTeam, but that’s where it stops. These nodes appear to have no child runs and show empty outputs when accessed through the API, even though the web interface clearly shows they have children and proper outputs.
Has anyone encountered this behavior before? Am I missing something in how I’m accessing the Run object, or could this be a known issue with LangSmith’s API when working with LangGraph applications?
Check your LangSmith client version - there’s a breaking change around v0.8.x that messes with nested run serialization, especially for LangGraph. Hit this exact issue when I upgraded and all my multi-agent traces showed empty child arrays through the API. The web UI works fine because it uses a different path that handles backwards compatibility. Your programmatic access is probably hitting the new format but using old methods. Run pip show langsmith first, then try include_children=True when fetching runs. If you’re on an older version, either upgrade or switch to client.get_run_tree() instead of get_run(). The tree method forces proper child loading even with version mismatches. This bit me because local testing worked perfectly but prod deployments with different versions failed silently.
I’ve hit this exact timing issue with LangSmith’s API before. The web interface and programmatic access use completely different mechanisms to show trace data, which causes this mismatch. When you grab the Run object through code, the child runs often aren’t fully populated yet because of async processing. Adding a small delay before hitting the API usually fixes it. Also double-check you’re using the right run ID - parent runs don’t always show child relationships immediately. Here’s what got me: there’s a big difference between accessing runs during execution vs after they’re done. If you’re querying while your multi-agent system is still running, you’re probably hitting it before the child runs get properly linked. Try pulling the same run ID a few minutes after everything finishes and see if the child structure shows up correctly.
This bit me hard on a project last year. LangGraph creates nested execution contexts that don’t map cleanly to LangSmith’s run model.
It’s not a timing or pagination issue - it’s architectural. When your multi-agent system spawns sub-agents, LangGraph wraps each one in its own execution context. LangSmith treats these as separate traces instead of proper parent-child relationships.
I switched to manual span creation and it worked. Instead of relying on automatic tracing, I explicitly created spans for each agent and linked them myself using the LangSmith client.
Something like:
with client.trace(name="planner") as planner_span:
# planner logic
with client.trace(name="executor", parent=planner_span) as exec_span:
# execution logic
Yeah it’s extra work, but you get predictable access patterns through the API. The automatic tracing just doesn’t handle LangGraph’s execution model properly.
Alternatively, dump the trace data to JSON during execution and parse that for evaluation. Way more reliable than fighting the API.
Been there with complex multi-agent workflows. The API timing issue is real, but there’s a bigger problem.
You’re dealing with nested execution patterns that LangSmith’s API can’t handle smoothly. Multi-agent systems create deep hierarchical traces that don’t work well with programmatic access.
Don’t fight LangSmith’s limitations - move your evaluation pipeline to something built for complex workflows. I’ve been running similar multi-agent comparisons using Latenode for orchestration.
Here’s what I do: hook up both agent systems as separate workflows in Latenode, then use its logging and comparison tools. You get proper parent-child relationships, real-time access to execution data, and custom evaluation dashboards that actually work.
Best part? You can set up automated A/B testing between single agent and multi-agent approaches. No more manual trace digging or API timing headaches.
Same frustration here with LangGraph traces. The issue is LangSmith can’t handle distributed execution across multiple agents properly. When ExecutionTeam spawns sub-agents, each gets its own trace context that doesn’t link back to the parent run through the API. I fixed this by building a custom trace collector that grabs run data before LangSmith processes it. Hook into LangGraph’s execution callbacks, capture the full execution tree yourself, then rebuild the relationships manually. The web interface works fine because it hits the raw trace database directly. But the API only shows processed relationships, which get screwed up during async multi-agent workflows. I built a simple trace aggregator that runs with my eval pipeline and keeps proper parent-child mappings intact. It’s extra work but beats fighting LangSmith’s limitations when you need reliable programmatic access to the full execution tree.
sounds like langsmith’s api pagination bug. when you pull run objects through the api, child nodes often end up on different pages. use client.list_runs() with the parent_run_id parameter instead of just fetching the main run object. fixed the same issue for me - web interface showed everything fine, but api calls were missing data.