How to properly compute OpenAI API costs when using prompt caching

I’m working with OpenAI’s API and need help understanding how to calculate billing when prompt caching is involved. When I get the usage data back from the API, I see separate values for regular input tokens and cached input tokens.

Here’s what I’m seeing in my response:

  • Regular input tokens: 1204
  • Cached input tokens: 1024
  • Output tokens: 12

The current pricing structure is:

  • Regular input: $0.150 per 1M tokens
  • Cached input: $0.075 per 1M tokens
  • Output: $0.600 per 1M tokens

My main question is about how these numbers work together. Do I need to manually subtract the cached token count from the total input tokens before doing my cost calculation? Or does the API already give me the correct breakdown where I can just multiply each category by its respective price?

I expected the regular input count to be around 180 tokens (1204 minus 1024), but that’s not what I’m getting. The documentation isn’t clear about this calculation method.

From what I’ve seen with cached prompts, those token counts are exactly what you get billed for. People get confused thinking cached tokens should be subtracted from regular tokens, but that’s not how it works. OpenAI processes your request, figures out what can use cached data, and counts those separately at the discount rate. Everything else becomes your regular input tokens at full price. So you just add it up: regular tokens + cached tokens (half price) + output tokens. I’ve noticed caching efficiency is all over the place depending on your prompt structure and timing between requests. Sometimes hardly anything gets cached, other times you get a decent chunk. The API docs could be way clearer about this, but once you realize these are separate counts instead of overlapping ones, billing makes way more sense.

I deal with this billing confusion constantly when building automated workflows with OpenAI’s API.

Just multiply each token type by its price and add them up. The API already broke it down for you - those numbers don’t overlap.

Setting up automated cost tracking really helped me get this. I built something that grabs these token breakdowns from every API call and calculates costs in real time.

For your numbers: (1204 × 0.000150) + (1024 × 0.000075) + (12 × 0.000600) = $0.2649

Your regular tokens aren’t 180 because caching doesn’t work like you think. Some parts of your prompt get cached, others don’t - it’s based on OpenAI’s internal logic.

I handle all this token tracking and cost calculation automatically now with Latenode. It connects to OpenAI’s API, grabs all the usage data, does the math, and logs everything to my database. Way better than calculating costs manually every time.

yeah, the api handles all the math for u - just use those numbers. cached tokens are parts openai recognizes from earlier requests, so they’re cheaper. don’t overthink it. multiply each token type by its rate and add them up. that 1204 + 1024 confused me at first too, but they’re separate counts, not overlapping.

The API response already gives you the correct breakdown for billing - no manual subtraction needed. OpenAI returns those separate values because they’re distinct token categories that get calculated independently. For your example, just calculate: (1204 × $0.150/1M) + (1024 × $0.075/1M) + (12 × $0.600/1M). The regular input tokens (1204) and cached input tokens (1024) aren’t overlapping - they’re separate buckets the API already sorted. Your regular input count isn’t around 180 because the API handles caching differently than simple subtraction. The cached tokens are probably parts of your prompt that matched previously cached content, while regular tokens are the unique parts plus anything that couldn’t be cached for technical reasons. This sorting happens on OpenAI’s end based on their internal caching logic, not through math on your side.

I work with OpenAI billing daily, and those token counts are already separated correctly - you can calculate them directly. You’re confused about getting 1204 regular tokens instead of 180 because you don’t understand how caching actually works. OpenAI doesn’t just cache consecutive chunks of your prompt. Their system caches specific patterns and structures it’s seen before, which could be scattered pieces throughout your request - not one continuous block. So your 1024 cached tokens might be spread across different parts of your prompt, leaving 1204 tokens that either couldn’t be cached or were new content. The billing’s straightforward once you get this - just apply the rates to each token type, no manual adjustments needed. I’ve noticed caching works way better when your prompts are consistent and you don’t wait too long between similar requests.

the token breakdown from openai is already set up for billing - you don’t need to do any math. those 1204 regular tokens and 1024 cached tokens aren’t supposed to be subtracted from each other. they’re separate billing categories that both count toward your total cost.

I’ve been fighting this same confusion for months while building cost tracking for our API usage.

Those token counts are right. Don’t subtract anything. The 1204 regular tokens and 1024 cached tokens are separate buckets - you get billed for both.

Your regular tokens aren’t 180 because caching works at the semantic level, not sequential. OpenAI caches scattered pieces of your prompt that match previous patterns and leaves the rest as regular tokens.

Automated cost monitoring solved this for me. I don’t manually calculate anything anymore.

I use Latenode to capture every API response, extract token breakdowns, calculate costs instantly, and dump everything into a spreadsheet. It handles the math and gives me real-time cost tracking across all my OpenAI workflows.

Way better than manual calculations every time you want to check spending.