Understanding Token Costs in OpenAI's Assistants API: Does Previous Conversation History Matter?

I’m looking for clarification on how token consumption works with the OpenAI Assistants API as opposed to the Chat Completions API.

From what I know, when using the Chat API, it’s necessary to resend the entire history of the chat with each request. This results in being charged for all those tokens every time you do that.

On the other hand, the Assistants API automatically remembers the discussion history, which is quite handy. However, I’m curious about the billing aspects.

Does the token count for the Assistants API include all earlier messages in the conversation when I make a new request? Or am I only charged for the latest message I’m sending? I want to ensure I fully understand the payment structure before engaging in longer chats.

I’ve used both APIs and you still get charged for the full conversation context with Assistants API - same as Chat Completions. The difference? OpenAI manages the context automatically instead of you sending it manually each time. When the assistant processes your message, it needs the entire thread history to stay coherent, so those tokens hit your bill. The upside is you don’t manage message arrays yourself or stress about context limits since the API handles truncation internally. I’ve seen this on longer conversations where token usage piles up even with short new messages. The automatic memory is great for UX, but costs are basically the same as including full context.