Understanding Token Costs in OpenAI's Assistants API: Does Previous Conversation History Matter?

livbrown · June 19, 2025, 10:28pm

I’m looking for clarification on how token consumption works with the OpenAI Assistants API as opposed to the Chat Completions API.

From what I know, when using the Chat API, it’s necessary to resend the entire history of the chat with each request. This results in being charged for all those tokens every time you do that.

On the other hand, the Assistants API automatically remembers the discussion history, which is quite handy. However, I’m curious about the billing aspects.

Does the token count for the Assistants API include all earlier messages in the conversation when I make a new request? Or am I only charged for the latest message I’m sending? I want to ensure I fully understand the payment structure before engaging in longer chats.

joec · July 1, 2025, 11:11pm

I’ve used both APIs and you still get charged for the full conversation context with Assistants API - same as Chat Completions. The difference? OpenAI manages the context automatically instead of you sending it manually each time. When the assistant processes your message, it needs the entire thread history to stay coherent, so those tokens hit your bill. The upside is you don’t manage message arrays yourself or stress about context limits since the API handles truncation internally. I’ve seen this on longer conversations where token usage piles up even with short new messages. The automatic memory is great for UX, but costs are basically the same as including full context.

Alaa · November 8, 2025, 7:20pm

In this article openai introduces 3 ways for managing conversation state:

Resending the whole conversation as new input (obviously charged per input tokens).
Using previous_response_id=response.id, in which they explicitly clarify:

Even when using previous_response_id, all previous input tokens for responses in the chain are billed as input tokens in the API.

A stateful conversation using the Conversations API:

conversation = client.conversations.create()

r1 = client.responses.create(
  model="gpt-4.1",
  input=[{"role": "user", "content": "tell me a joke"}],
  conversation=conversation.id
)

r2 = client.responses.create(
  model="gpt-4.1",
  input=[{"role": "user", "content": "why is it funny?"}],
  conversation=conversation.id
)

Unlike method 2, it’s not stated in the article that you pay for previous input tokens for each request.