Can anyone help me figure out how to access usage statistics and billing information for OpenAI’s speech recognition and text-to-speech APIs when using LangSmith for monitoring?
I’m working on a project where I need to track costs across different OpenAI services. When I use the chat models like GPT-4 or GPT-3.5, LangSmith shows me token counts and cost breakdowns perfectly fine. But when I switch to audio processing models or TTS functionality, I only get basic execution traces without any usage or cost details.
Has anyone managed to get this working? I’m wondering if there’s a specific configuration or method I’m missing to enable cost tracking for these audio-based models in LangSmith.
Yeah, this LangSmith limitation is super frustrating. I’ve been stuck with the same issue for weeks. What finally worked was hooking into the OpenAI client callbacks and manually pushing cost data to LangSmith through their custom events API. You’ll have to calculate pricing yourself, which is annoying, but at least you get the visibility you need.
I encountered a similar issue several months ago while working with OpenAI’s audio models. LangSmith does not automatically track usage metrics for these audio APIs, unlike its behavior with text-based models. To resolve this, I implemented a custom logger that captures the audio API responses manually. It’s important to note that OpenAI’s audio endpoints provide usage information within response headers, which LangSmith doesn’t process by default. You will need to extract relevant data such as duration and character count, and then send this information to LangSmith using their SDK. Alternatively, you can retrieve the data directly from OpenAI’s usage dashboard API and correlate it with your LangSmith logs using timestamps. While it’s less straightforward than the automatic tracking for chat models, this method has proven effective for monitoring costs.
This situation surprised me when I moved from direct OpenAI calls to LangSmith monitoring. The main issue is that LangSmith wasn’t designed for audio model billing—it only accounts for token-based pricing. I found success by intercepting OpenAI client responses before they reach LangSmith. The audio API responses provide different usage formats than text models; for instance, Whisper returns duration in seconds, while TTS requires character counts for billing. I developed a middleware layer to extract this data and convert it into LangSmith-compatible cost information using OpenAI’s published pricing. It’s crucial to ensure your custom logging shares the same trace context so everything aligns in the dashboard. Just a heads up—OpenAI periodically updates their audio pricing, meaning you’ll need to adjust your calculations manually. Not ideal, but it offers better visibility into audio costs than having none.
Yeah, I’ve hit this wall several times. LangSmith’s auto cost tracking only works with token-based models - it doesn’t handle audio endpoints.
I built a wrapper around OpenAI’s audio calls that manually logs everything. For speech-to-text, track the audio duration (it’s in the response). For TTS, track character count from your input.
The trick is pushing this data to LangSmith manually. I wrote a function that calculates costs using OpenAI’s current pricing ($0.006/minute for Whisper, $0.015 per 1K characters for standard TTS voices) and logs it as custom metadata.
Downside? You’re stuck maintaining those pricing calculations yourself since LangSmith doesn’t handle them. But once it’s set up, it’s solid.
OpenAI covered some integration stuff in their dev stream: