Understanding token calculation discrepancy in OpenAI API requests

I’m confused about how OpenAI calculates tokens in my API calls. When I send a basic message like “hello” to the GPT-4 model, I expect it to use just 1 token based on what I see in token counting tools. Since GPT-4 has a context limit of 8192 tokens, I figured I could set my max_tokens parameter to 8191 to leave room for that single input token.

But the API keeps telling me my message uses 8 tokens instead of 1. This doesn’t make sense to me. Here’s what I’m sending:

import requests

api_key = "your-api-key-here"
headers = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {api_key}"
}

payload = {
    "model": "gpt-4",
    "max_tokens": 8191,
    "messages": [
        {
            "role": "user",
            "content": "hello"
        }
    ]
}

response = requests.post("https://api.openai.com/v1/chat/completions", 
                        headers=headers, 
                        json=payload)

print(response.json())

The error I get back says my request needs 8199 tokens total (8 for messages plus 8191 for completion) but the limit is 8192. Why does “hello” count as 8 tokens when it should be much less? Am I missing something about how the API counts tokens?

The token count difference occurs due to formatting requirements in the chat API. OpenAI adds special tokens for roles (user/assistant/system), message boundaries, and metadata when processing inputs. Thus, even a simple message like ‘hello’ totals around 8 tokens. This is distinct from the previous completions endpoint, where only plain text was counted. The structured nature of the chat format requires additional tokens, typically 4-6 for message formatting and more for the overall structure. To stay within limits, consider setting your max_tokens to 8180 or lower for buffer space.