Hello everyone! I’m working on a project where I integrated Langsmith for monitoring, but I’m getting really confused about the token counts it’s reporting. When I test the same input text, Langsmith shows around 183k tokens while both the OpenAI token counter and Google AI Studio are showing only 16k tokens for identical content.
I’m using Gemini Flash model for my tests. This huge difference is making it hard to estimate costs and usage properly. Has anyone else run into similar token counting issues when using Langsmith with Google’s Gemini models?
The discrepancy is quite significant - we’re talking about more than 10x difference in reported token usage. I’m wondering if there’s some configuration issue on my end or if this is a known problem with how different tools calculate tokens for Gemini models.
Had the same issue a few months ago with Gemini models on different platforms. Turns out Langsmith was adding overhead tokens for multimodal content and system prompts that I couldn’t see in my requests. Check your raw request logs in Langsmith - you’ll see exactly what’s getting sent versus what you think you’re sending. I found hidden system instructions and formatting tokens that were bumping up my count big time. Also check if you’ve got automatic prompt templates or chain configs that are sneaking in extra context. Google AI Studio’s count is usually spot-on since it’s using Google’s actual tokenizer. I’d stick with that for cost estimates until you figure out what’s going wrong with your Langsmith setup.
I’ve hit this exact issue in production with monitoring tools that mess up tokenization across different model providers.
The problem? Your monitoring layer estimates tokens using a different tokenizer than what the actual model uses. Gemini tokenizes differently than OpenAI.
Here’s what worked for me: I built an automation that queries the actual model API for token counts before sending requests. You get real numbers instead of wonky estimates from monitoring tools.
My flow intercepts requests, grabs accurate token counts from Google’s API, logs the real usage data, then processes the request. Takes 5 minutes to set up and eliminates billing surprises.
The automation handles token validation, cost estimation, and usage tracking in one pipeline. No more guessing or dealing with inflated numbers from third-party tools.
You can build this easily with Latenode. It handles the API calls and data processing without coding headaches.
Yeah, this threw me off too during a big deployment. The problem is Langsmith counts tokens differently than Gemini’s actual tokenizer. I’ve seen Langsmith mess up the counts, especially with conversation history or wrapper functions that process text multiple times. Here’s what worked for me: check Google’s actual API response - the headers show real token usage. I found Langsmith was counting tokens from old conversation turns or metadata that Gemini Flash wasn’t even processing. For production, I built custom logging that pulls token counts straight from Google’s API instead of trusting Langsmith’s estimates. Google’s native tokenizer is always more accurate since that’s what they bill you on. I’d compare your setup against the actual API response data to see where the inflated counts are coming from.