Hey everyone! I’ve been working with LangSmith for tracking my language model applications and debugging issues, but I want to find some open source solutions instead. The vendor dependency is making me a bit nervous and I’d prefer something I can host myself.
I need tools that can handle execution tracking, detailed reasoning step logs, and basic performance measurements. My current setup uses LangChain and some custom agent workflows, so compatibility with those would be great.
Has anyone found good Python libraries or frameworks that do this kind of monitoring? I’m particularly interested in self-hosted options that won’t break the bank. What’s been working well for your projects?
I’ve been using Phoenix by Arize for LLM observability and it’s solid. Open source, so no vendor lock-in worries. Setup’s easy - runs locally or on your infrastructure, and works great with LangChain through their instrumentation package. Best part is seeing token usage and latency across different model calls. Helped me cut costs on some pricey GPT-4 workflows. Clean UI makes debugging individual traces simple. One heads up - it’ll eat memory if you’re processing massive trace volumes, but handles dev work and moderate production loads fine. Worth trying if you want control over your monitoring.
We tried LangSmith last year but ditched it for a custom setup that works way better.
I built a simple monitoring layer with OpenTelemetry - just add custom spans for each LLM call. Takes 30 minutes to instrument your code and you get full distributed tracing. We send everything to Jaeger for visualization. Perfect for tracking flows and finding bottlenecks.
For LLM stuff like prompt logging and token counting, I dump structured logs to Elasticsearch. Python decorator does the work. Kibana dashboards show cost trends, error rates, whatever we need.
Best part: we had an agent loop burning tokens like crazy. The trace pinpointed exactly which reasoning step went haywire. Saved us $300/day in API costs.
More manual than packaged solutions but you control everything. Your ops team probably already knows these tools anyway.
Honestly, wandb has been solid for me. Their LLM tracking isn’t as polished as dedicated tools, but it’s free and does what I need. Captures prompts, responses, costs, timing - the works. Just heads up tho, integrating with custom workflows takes some elbow grease. Not as plug-and-play as the other options folks mentioned.
Been using LangFuse for six months and it’s a great replacement for the expensive commercial monitoring tools. Docker setup was super easy, and the trace visualization covers everything I need. The detailed LLM call tracking and cost breakdowns really won me over. Adding it to my LangChain stuff was painless - just drop in their callback handler and you’re good to go. Zero performance hit in production. Only downside is the docs get pretty thin when you want to do custom metrics, so you’ll be experimenting a bit. But honestly, the hierarchical trace view is a lifesaver for debugging complex agent workflows - beats digging through raw logs any day.