How to track token consumption with OpenAI Whisper speech-to-text API

JollyMusic3 · August 22, 2025, 7:06pm

I’m working with the OpenAI Whisper API for audio transcription. When I make requests to convert audio files to text, the API response only contains the transcribed text in a simple format:

{
  "transcription": "converted audio content here"
}

The response doesn’t include any usage metrics or billing information. I need to monitor my token usage for budget tracking purposes. Is there a way to get token count data from the API response, or do I need to check usage statistics somewhere else in my OpenAI account? Any suggestions on how to track this would be helpful.

DancingButterfly · September 5, 2025, 2:36am

Whisper API charges by audio duration, not tokens - that’s why you don’t see token counts. It’s $0.006 per minute of audio. I use it tons for podcast transcription and the best way to track usage is through OpenAI’s dashboard under “Usage.” You’ll see detailed breakdowns by date and model. For programmatic tracking, I just log each audio file’s duration before hitting the API, then multiply by the rate. Works great for budget monitoring since the billing’s way simpler than text APIs.

Alice45 · September 3, 2025, 1:00pm

yeah, whisper pricing is time-based, not tokens like gpt. check your openai dashboard for usage stats or set up webhooks for real-time monitoring. i just use a simple counter in my app to track minutes processed.

jade_journey · September 2, 2025, 8:50pm

You need automated tracking running alongside your Whisper calls for real budget control.

I built a system that logs audio file duration before sending to Whisper, then stores cost calculations in a database. Managing this manually gets messy when you scale though.

Automated workflows handle tracking way better. Every audio process automatically calculates costs, logs usage to spreadsheets, sends alerts at budget thresholds, and generates monthly reports.

I use Latenode for this automation. It connects directly to OpenAI APIs and tracks usage in real time without custom logging code. Set up workflows that monitor spending, alert before hitting limits, and export data anywhere.

Dashboard stats work for casual checking, but automated tracking prevents surprise bills and gives proper budget control.

CreatingStone · September 2, 2025, 3:41pm

whisper billing confused me at first too lol. since it’s duration-based, i just calculate costs upfront - grab the audio length with librosa in python, then multiply by 0.006. way easier than waiting for the dashboard to update.

Jack81 · September 2, 2025, 12:54pm

Yeah, I get why people expect token tracking - that’s how most OpenAI APIs work. But Whisper’s different. It just charges $0.006 per minute of audio, no tokens involved. I’ve been transcribing meetings with it for months and learned it’s way better to calculate costs upfront rather than wait for the dashboard. I grab the audio duration with ffprobe before hitting the API, then log both the duration and cost right in my app. Gives me instant cost tracking without waiting around for OpenAI’s stats to update. The dashboard’s fine for monthly checks, but if you want real-time numbers, you gotta track duration yourself.

mythicMuse · September 1, 2025, 12:53pm

Manual calculations work for small projects, but become a nightmare when you’re processing hundreds of audio files daily.

I automated this - my setup grabs audio duration, calls Whisper, then logs everything automatically. No more ffprobe commands or updating spreadsheets by hand.

The real game changer? Workflows that monitor spending in real time. Mine calculates costs on the fly, alerts me when I hit budget limits, and exports data straight to accounting.

Latenode handles this perfectly. Build workflows that process audio, track duration, calculate costs, and log to databases or sheets. Plus it alerts you before hitting spending limits.

Beats manually checking OpenAI’s dashboard or writing custom code that breaks when you scale.

SilentSailing34 · August 31, 2025, 5:20pm

Yeah, the missing token count threw me off when I first started with Whisper too. Here’s the thing - Whisper doesn’t work like ChatGPT or other text models. It charges per minute of audio, not tokens. You pay $0.006 per minute no matter how much text comes out of it. That’s why your API response doesn’t show usage metrics - OpenAI handles the billing on their end based on audio length. What I do is track duration in my app before hitting the API. Just grab the audio file length in seconds, convert to minutes, and multiply by the rate. The OpenAI dashboard updates eventually but it’s not instant, so tracking it yourself gives you better visibility into what you’re spending.