How to properly track Gemini API usage with Langsmith tracing

mythicMuse · August 18, 2025, 3:30pm

I’m having trouble getting token tracking to work with Langsmith when using the Gemini API

I’m making direct calls to the Gemini API without using langchain. Even though I added the @traceable decorator to my function, when I check Langsmith dashboard it shows 0 tokens for all my requests.

from langsmith import traceable
import google.generativeai as genai

@traceable
def chat_with_model(user_message, model_name="gemini-pro"):
    model = genai.GenerativeModel(model_name)
    response = model.generate_content(user_message)
    return response.text

result = chat_with_model("Tell me about machine learning")

I know there’s a wrap_openai helper function for OpenAI models. Is there something similar available for Gemini? How can I make sure the token usage gets tracked correctly in my traces?

JollyMusic3 · August 27, 2025, 11:24pm

The @traceable decorator won’t capture token usage from direct Gemini API calls - LangSmith doesn’t have built-in support for Google’s SDK. You’ll need to manually grab the usage data and feed it to the tracer.

Here’s what worked for me. Pull the usage_metadata from the response and pass it to LangSmith yourself:

from langsmith import traceable
import google.generativeai as genai

@traceable
def chat_with_model(user_message, model_name="gemini-pro"):
    model = genai.GenerativeModel(model_name)
    response = model.generate_content(user_message)
    
    # Extract token usage
    if hasattr(response, 'usage_metadata'):
        usage = response.usage_metadata
        # Log usage manually to current trace
        
    return response.text

Grab response.usage_metadata.prompt_token_count and response.usage_metadata.candidates_token_count and log them to your trace context manually. There’s no automatic wrapper like OpenAI has.

aroberts · August 27, 2025, 2:26am

yea, currently there’s no direct Gemini wrapper in LangSmith. you’ll need to get response.usage_metadata from the Gemini API and log that to LangSmith manually. it’s a hassle, but it’s what we gotta do for now.

oliviac · August 26, 2025, 9:20pm

Hit this exact problem last quarter when we added Gemini to our OpenAI setup. Token tracking gaps completely wrecked our cost monitoring.

You want get_current_run_tree(), but here’s what’ll bite you: Gemini’s usage metadata sometimes returns None on short responses or during rate limiting. Always check it exists before logging.

Here’s what I use now:

from langsmith import traceable
from langsmith.run_helpers import get_current_run_tree
import google.generativeai as genai

@traceable
def chat_with_model(user_message, model_name="gemini-pro"):
    model = genai.GenerativeModel(model_name)
    response = model.generate_content(user_message)
    
    # Update current trace with token usage
    if hasattr(response, 'usage_metadata') and response.usage_metadata:
        current_run = get_current_run_tree()
        if current_run:
            usage_data = {
                'prompt_tokens': response.usage_metadata.prompt_token_count,
                'completion_tokens': response.usage_metadata.candidates_token_count,
                'total_tokens': response.usage_metadata.total_token_count
            }
            current_run.update(usage=usage_data)
    
    return response.text

This video covers tracing basics really well if you need to understand LangSmith’s trace lifecycle:

Key difference from other answers: use total_token_count directly from Gemini instead of calculating it yourself. Saves you from math errors when the API response structure changes.

Alex_Brave · August 24, 2025, 2:40pm

nobody mentioned this - gemini’s usage_metadata goes empty when the model’s warming up or under heavy load. always add a fallback or you’ll get random gaps in your traces. found out the hard way when our prod dashboard went dark for hours.

olivias · August 24, 2025, 12:23pm

Been dealing with this for months. The manual approach works, but there’s a cleaner way using LangSmith’s trace context management. Don’t just extract the usage metadata - update the current trace run with token counts. Use get_current_run_tree() to grab the active trace, then call update() with your usage data. This makes tokens show up in your dashboard instead of getting lost in logs. Here’s what most people miss: format the usage data exactly like LangSmith wants it. Match their OpenAI trace schema - field names and structure matter. I spent weeks wondering why my manual logging disappeared until I figured this out. It’s more work than OpenAI integration, but becomes routine once you get it. Just handle cases where usage_metadata might be None on certain responses.

byteBard_007 · August 23, 2025, 5:59pm

Manual token tracking sucks and breaks constantly. I’ve hit this same wall with multiple APIs at work.

You’re basically duct-taping tools that hate each other. Every new API means writing more custom tracking code.

I ended up using Latenode for this stuff. It handles the API calls, tracks usage automatically, and gives you real observability without decorators or manual logging. Set up your Gemini calls as workflows and get all the tracking data in one dashboard.

When you add more APIs later, everything just works the same way. No more wrapper functions for every single service.