I’m working with OpenAI’s audio processing and speech synthesis models and trying to track usage metrics in Langsmith. When I use regular text models like GPT-4 or GPT-3.5, I can see detailed information about token consumption and associated costs in my Langsmith runs.
However, when I work with OpenAI’s audio transcription (Whisper) and text-to-speech models, the tracking seems incomplete. The runs appear in my traces but I don’t get any usage statistics or cost breakdowns like I do with the chat models.
Is there a way to capture this billing information for audio models in Langsmith? Are there specific configuration settings I need to enable, or is this data simply not available for these model types? Any guidance on how to properly monitor usage and expenses for these audio-related APIs would be really helpful.
Been dealing with this for months. Audio models don’t expose usage metadata through their API responses like other models do.
I built a simple wrapper that calculates costs myself. Whisper charges per minute, so I measure audio duration before sending requests. TTS charges per character, so I count those first.
I store these costs as custom fields in Langsmith using the extra parameter when creating runs. Not perfect since pricing changes, but gives a decent estimate.
Check your OpenAI usage dashboard too. It breaks down costs by model type, though you lose the detailed trace-level info you get with chat models.
Honestly, OpenAI should fix this. The inconsistency between models is annoying when you’re tracking everything in one system.
Yes, this issue is indeed known with Langsmith. Audio models such as Whisper and TTS do not return usage metadata in the same way that text models do, which explains why you are not receiving detailed breakdowns. I have experienced similar situations and opted for constructing custom tracking solutions. One effective method is to capture the audio file sizes and durations prior to making requests, which allows for approximate cost calculations based on OpenAI’s pricing. This information can be logged as custom metadata in your Langsmith traces using the update_run method. Alternatively, you might consider retrieving usage data directly from OpenAI’s dashboard API and correlating it with your trace timestamps. While it requires additional effort, this approach ensures accurate billing data which can be integrated with your other metrics.
Yeah, this tracking gap is super common. I deal with it constantly building voice workflows at work.
I ditched manual tracking and moved to Latenode for audio model orchestration. It automatically monitors usage across all OpenAI models - Whisper, TTS, everything.
Best part? Set up workflows that grab audio duration and character counts before API calls, then log everything with actual costs in one dashboard. No more bouncing between OpenAI’s billing and your traces.
I’ve got spending alerts and automatic fallbacks when usage hits certain limits. Way cleaner than patching together tracking solutions.
The visual builder lets you see your entire audio pipeline - file processing to cost tracking - in one place.
Check it out: https://latenode.com
Yeah, this caught me off guard too when I started working with audio. OpenAI’s audio endpoints don’t return usage data like chat completions do - no token counts or billing info at all. Here’s what I did: I built a middleware layer that catches requests before they hit OpenAI’s API. For Whisper, I use ffprobe to get the file duration, then multiply by OpenAI’s per-minute rate. For TTS, I count input characters and use their character pricing. Then I push this data into Langsmith as custom metadata when creating runs. The annoying part? You’ve got to keep your pricing up to date manually. I check OpenAI’s docs quarterly and update my constants. It’s extra work, but you get the same granular cost tracking you have for chat models.
langsmith still lacks billing for audio models. I’ve got a script that fetches data from OpenAI’s usage API periodically, aligning it with my traces. It’s kinda clunky, but it saves me from the hassle of manual calc’s.