I’m working with the Gemini API and making direct calls to it instead of using langchain. I added the @traceable decorator to my function but when I check Langsmith, it shows 0 tokens used. The token tracking isn’t working correctly. I know there’s a wrap_openai helper function available for OpenAI integration. Is there something similar for Gemini? How can I make sure Langsmith properly monitors my token usage when calling Gemini directly? What steps am I missing to get accurate tracing?
Got burned by this exact issue last year on a project with multiple LLM providers. Direct API calls to Gemini are a pain for tracking.
I built a simple middleware layer that grabs both request and response data. Instead of chasing tokens after the fact, I capture everything - input tokens, output tokens, model used, response time.
The key is storing this in a format Langsmith can actually use. I serialize the usage metadata and attach it to the trace as custom fields. Way better than trying to update traces while they’re running.
One gotcha - Gemini returns different metadata structures depending on which model version you hit. Make sure your extraction logic handles both new and legacy response formats or you’ll get random gaps.
I’ve used this pattern in 6 different projects now and it’s solid. Takes maybe 30 minutes to set up but saves you from manual counting nightmares later.
Gemini doesn’t have a wrap_openai equivalent yet, so you’ll need to do some manual work. I built a wrapper function that grabs usage stats from the response object after each API call. You can pull the token data from response.usage_metadata and push it to Langsmith using their context manager to update the current trace. Just make sure you’re catching both input and output tokens and feeding them into the trace metadata. Skip this step and Langsmith won’t track tokens like it does automatically with OpenAI.
Yeah, gemini integration sucks compared to opanAI. I built a custom decorater that wraps @traceable and pulls token counts straight from gemini’s response object. Just grab the usage_metadata field and push that data to the langsmith trace. Extra work, but it’s the only way to get token tracking working right now.
Manual token tracking is a nightmare with multiple API calls and different response formats. Been there, done that with too many AI integrations.
Skip the custom wrappers and decorators - I route all my Gemini API calls through Latenode instead. It tracks tokens automatically and feeds everything to your monitoring system, including Langsmith.
Here’s how it works: API call hits, Latenode grabs the request/response plus usage data, then pushes token counts wherever you need them. Zero custom code to babysit.
Used to waste hours debugging token counting. Now it just works - no decorators, no manual extraction headaches. Bonus: you get retry logic and error handling thrown in.
Check it out: https://latenode.com
The @traceable decorator won’t capture token metrics from direct Gemini API calls - it doesn’t have the built-in instrumentation like wrap_openai does. You’ll have to manually track and log token usage in your decorated function. When Gemini sends back a response, grab the token counts from the usage metadata and pass them to the trace using langsmith client’s update methods or include them in your function’s return metadata. I ran into the same problem and manually extracting prompt_token_count and candidates_token_count from the Gemini response worked great. You can also use langsmith SDK’s run context to update token info programmatically while it’s running.