Hello everyone! I’m working with an Ollama setup combined with OpenWeb UI and I need some guidance. I want to monitor and analyze how well my models are performing, but I’m not sure about the best approach.
Is it possible to integrate LangSmith for tracking model performance metrics when using this combination? I’m particularly interested in getting detailed analytics like precision rates, recall measurements, and F1 scores for my models.
If LangSmith isn’t the right fit for this use case, what other monitoring or tracing solutions would you recommend? I really need to have visibility into these performance metrics to optimize my setup properly.
Any suggestions or experience with similar integrations would be greatly appreciated!
I’ve been running this setup for six months and hit the same issues. LangSmith works but the native support is lacking. I had better luck with Weights & Biases plus a custom FastAPI wrapper between OpenWeb UI and Ollama. This combination provides precision, recall, and F1 metrics without the hassle of LangSmith compatibility. The wrapper logs everything and sends it to your dashboard in real-time. MLflow is also a solid option, as it has better documentation for custom integrations and effectively handles model performance tracking. While it takes some initial setup work, it ultimately offers comprehensive visibility into model performance, including when it starts to degrade.
Been there. Skip the complex integrations and go with Prometheus + Grafana instead.
I wasted weeks trying to make LangSmith work with Ollama - total nightmare. What actually worked: a simple Python script between your requests that captures the metrics you need.
Here’s what I did: Flask endpoint that proxies requests to Ollama, logs response times, token counts, and accuracy data to Prometheus. Grafana pulls everything into dashboards.
For F1 scores and precision metrics, you’ll need your own evaluation logic since Ollama doesn’t output those. I wrote a small evaluator that runs ground truth comparisons on response samples every hour.
Whole setup took two days and gives me everything LangSmith promised without the headaches. Plus you own your data and customize metrics however you want.
Want something simpler? LangFuse has decent Ollama support with minimal setup.
langsmith’s overkill for most ollama setups. I’ve been running opentelemetry + jaeger for tracing with openwebui - takes 30 minutes to set up and you get solid insights without all that complexity.
I’ve run production Ollama deployments, and honestly? Skip the complex monitoring stuff for now. Most setups don’t need LangSmith’s overhead. Here’s what actually worked for me: build a simple logging layer that grabs request metadata, response times, and model outputs straight from OpenWeb UI API calls. Dump it all in a basic database and analyze offline for your precision/recall metrics. Real-time monitoring sounds cool, but clean structured logs you can batch process matter way more. For F1 scores, you’re building your own evaluation logic anyway - no monitoring tool gives you ground truth comparisons out of the box. Start with file-based logging and only add fancy dashboards when you actually need them.
Integrating LangSmith with an Ollama setup alongside OpenWeb UI isn’t straightforward since they don’t have native compatibility. However, you can achieve your desired performance metrics through a middleware solution. I’ve implemented a custom logging mechanism that captures and forwards request and response data between the two platforms to LangSmith’s tracing API. This requires some development effort but yields great results. Alternatively, consider OpenLLMetry, which has better support for Ollama and is simpler to implement while offering similar monitoring capabilities. The key is to ensure that your API calls are correctly intercepted and the data is formatted for your chosen monitoring system. Once established, tracking performance becomes much more manageable.
Look, I’ve built monitoring systems for ML pipelines at scale and everyone’s overcomplicating this.
Forget custom middleware or complex Prometheus setups. You need proper workflow automation that handles data flow between components seamlessly.
I solved this by creating automated workflows between OpenWeb UI and Ollama. The workflow captures every request and response, processes data for metrics, and pushes everything to your monitoring dashboard.
You can set up automated evaluation pipelines that run F1, precision, and recall calculations on schedule. No manual scripting or maintenance headaches.
For performance tracking, I built workflows that sample responses, run them against ground truth datasets, and alert when performance drops. Takes an hour to set up versus weeks of custom development.
Automation scales better too. Add more models or change your setup - workflows adapt without breaking.
Skip building custom solutions from scratch. Use proper automation for your data pipeline and monitoring.