Hi everyone! I’m working on a project where I need to evaluate my language models, but I have some privacy constraints. I’m curious about running Langsmith evaluations completely offline or locally hosted. The main issue is that my dataset contains sensitive corporate information that cannot leave our infrastructure due to compliance requirements. Is there a way to set up Langsmith to work with local models instead of cloud-based APIs like OpenAI? I want to make sure no data gets transmitted to third-party services outside our region. Has anyone dealt with similar security restrictions? Any guidance would be really helpful!
Been there. Had the same issue with sensitive financial data that couldn’t leave our network. Compliance was all over us about data sovereignty.
You could set up local evaluation frameworks, but it’s a maintenance nightmare. Model updates, dependencies, security patches - becomes someone’s full-time job.
Latenode solved this for us. Deploy it entirely in your infrastructure and connect to local models without external API calls. Get automation benefits without security headaches.
I built automated evaluation workflows that run on our internal servers, process sensitive data locally, and generate reports. Nothing leaves our network, but we still get enterprise-grade evaluation.
Took half a day to set up. Now compliance is happy and we get reliable results. Way better than building from scratch or wrestling with complex local installs.
Absolutely possible. I’d go with a hybrid approach - dealt with this exact issue on a government contract with similar data restrictions. We containerized everything with Docker and ran it on our private cloud. Skip Ollama for production and use vLLM or TGI instead - they’re way more efficient for model serving. Just modify Langsmith’s evaluation scripts to hit your internal endpoints instead of external APIs. Yeah, the docs suck for local deployments, but the Python SDK works with any OpenAI-compatible endpoint. Watch out for token counting differences between local models and cloud APIs - we had to write custom tokenizers for proper usage tracking. Mistral 7B and CodeLlama variants worked well for us. Decent evaluation quality without killing our hardware. Took about a week to set up and test, but now we’re processing thousands of evaluations daily with zero data leaving our network.
Yeah, you can definitely run Langsmith evaluators locally - just takes some setup. I did this for a healthcare client with strict HIPAA requirements where any external API calls were a no-go. The trick is swapping out cloud APIs for locally hosted models. We used Ollama to serve Llama and Mistral models locally, then tweaked the evaluator configs to hit those local endpoints instead. Performance isn’t quite GPT-4 level, but everything stays on your infrastructure. You’ll want beefy hardware though - 32GB RAM minimum and GPU acceleration if you want decent speeds. Model downloads are huge initially, but once they’re cached locally, evaluations run smooth with zero external dependencies. Fair warning: the local deployment docs are pretty sparse compared to the cloud version, so you’ll do some troubleshooting. Once you get it dialed in though, you’ve got complete control over your data.