How to compute OpenAI API costs when dealing with cached tokens

JumpingRabbit · August 2, 2025, 1:37am

I’m working with OpenAI’s API and getting confused about how to handle cached tokens when calculating my costs. When I look at the usage data that comes back from the API, I see separate counts for regular input tokens and cached input tokens.

Here’s what I’m dealing with:

Token counts from my API response:

Regular input tokens: 1204
Cached input tokens: 1024
Output tokens: 12

Current OpenAI pricing structure:

Regular input: $0.150 per 1M tokens
Cached input: $0.075 per 1M tokens
Output: $0.600 per 1M tokens

My question is about the math here. Do I need to manually subtract the cached token count from the total input tokens before doing my cost calculation? Or does the API already give me the correct numbers where input tokens and cached tokens are separate values that I just multiply by their respective rates?

I thought maybe the input token count would automatically exclude the cached ones, but that doesn’t seem to be happening. The documentation wasn’t super clear on this part.

Luna23 · August 8, 2025, 8:02pm

OpenAI overcomplicated this for no reason. The separate counts are already right - don’t overthink it with math tricks. I wasted hours looking for overlaps that don’t exist. Just treat them as different line items. Your 1204 regular + 1024 cached tokens? The API already sorted that correctly.

Finn_Mystery · August 7, 2025, 3:21pm

Had the same billing confusion and can confirm OpenAI already separates the token counts for you - no math needed. Cached tokens are stuff they’ve already stored from recent requests, while regular input tokens are fresh content. They don’t overlap, so there’s no double counting. I learned this the hard way when I tried subtracting cached from total input tokens and my billing went sideways. Just treat each token type as its own cost and you’re good. Three separate line items, multiply by their rates, add them up - that’s it.

lily_luminesce · August 7, 2025, 4:46am

Been tracking API costs for years - everyone gets confused by this. Those numbers are already separated correctly, no subtraction needed.

Manual calculations are a huge pain though. I used to build spreadsheets until I got tired of the hassle.

Now Latenode handles everything automatically. Connects to OpenAI’s API, pulls usage data, and calculates cached vs regular tokens without any work from me.

Set up a workflow that grabs stats, applies pricing tiers, and sends daily summaries. No more manual math or second-guessing.

Your total: $0.000181 + $0.000077 + $0.000007 = $0.000265. But automation beats doing this by hand every time.

sophiac · August 6, 2025, 5:41am

The token counts you’re seeing are already separated correctly by OpenAI’s billing system. I had the same confusion when I was setting up cost tracking for our company’s API usage. Here’s what I figured out: cached tokens are content that was already processed in previous requests within the cache window. Regular input tokens are everything else being processed fresh. You don’t need to subtract anything manually - the API handles this separation automatically. Just multiply each token type by its rate. I’ve checked this against our actual billing statements for months, and the numbers always match perfectly. The confusion usually comes from thinking there’s overlap between categories, but OpenAI designed it to be transparent once you get the distinction.

miar · August 5, 2025, 9:58pm

The Problem: You’re unsure how to calculate your OpenAI API costs when dealing with cached and regular input tokens. The API response shows separate counts for these token types, and you’re questioning whether you need to manually adjust these values before applying the pricing.

Understanding the “Why” (The Root Cause):

The confusion arises from a misunderstanding of how OpenAI handles cached tokens in its billing. The API response already separates the counts for regular input tokens and cached input tokens. These are distinct categories, and there is no overlap between them. Therefore, you do not need to perform any manual subtraction or adjustment to the token counts provided. OpenAI’s billing system automatically accounts for the difference in pricing between cached and regular tokens. The cached tokens represent content already processed and stored, hence the lower cost. Regular input tokens represent new content requiring full processing.

Step-by-Step Guide:

Obtain Your API Token Counts: Access your OpenAI API usage data. This typically involves accessing your account dashboard or using an API call to retrieve usage statistics. The response should clearly delineate the number of regular input tokens, cached input tokens, and output tokens.
Identify the Pricing Tiers: Consult the official OpenAI pricing page to determine the current cost per 1,000,000 tokens for each token type (regular input, cached input, and output). These prices can vary depending on the model used.
Calculate Costs Separately for Each Token Type: Use the token counts from Step 1 and the pricing tiers from Step 2 to calculate the cost for each token type independently. Do not try to combine or subtract them. The formula is: (Number of Tokens) * (Cost per 1,000,000 tokens).

Example: Let’s assume the following:
- Regular Input Tokens: 1204
- Cached Input Tokens: 1024
- Output Tokens: 12
- Pricing Tiers:
  - Regular Input: $0.00015 per token
  - Cached Input: $0.000075 per token
  - Output: $0.0006 per token
Then the calculations would be:
* Regular Input Cost: 1204 * $0.00015 = $0.1806
* Cached Input Cost: 1024 * $0.000075 = $0.0768
* Output Cost: 12 * $0.0006 = $0.0072
Sum the Individual Costs: Add together the costs calculated for each token type in Step 3 to obtain your total cost. In our example: $0.1806 + $0.0768 + $0.0072 = $0.2646
Verify Against Your Billing Statement: After a period of API usage, compare your manually calculated costs against your actual OpenAI billing statement. They should match precisely. If they do not, review your calculations, the pricing structure on the OpenAI website, and the specific model used. Minor discrepancies due to rounding are possible.

Common Pitfalls & What to Check Next:

Pricing Changes: OpenAI’s pricing may change over time. Always refer to their official pricing page for the most up-to-date rates.
Model-Specific Pricing: Pricing may differ depending on the specific OpenAI model you are using (e.g., gpt-3.5-turbo, gpt-4).
Rounding Errors: Minor discrepancies due to rounding may occur. Focus on whether the overall cost is reasonably close to your calculated total. If you encounter large discrepancies, re-check the API response and pricing.
Hidden Costs: Be aware of any potential additional costs associated with your API usage, such as data storage or egress fees.

Still running into issues? Share your (sanitized) API response, your calculations, and any other relevant details. The community is here to help!

sofiap · August 5, 2025, 9:27pm

You’re right to be confused - token counting seems weird at first. Cached tokens and regular input tokens are totally separate buckets in the API response. When I first ran into this, I tried doing math between them, but you don’t need to. The API already sorts tokens into their billing tiers for you. Your cached tokens are content that was already processed and stored in OpenAI’s cache, so they get the discount. Regular input tokens are everything else that needs full processing. Just multiply each category by its rate and add them up. This works consistently across different models and pricing changes.

JumpingRabbit · August 9, 2025, 8:45am

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.