GPT API returns word fragments instead of complete words as highest probability tokens

I’m new to working with language models and the OpenAI API. I want to predict the most likely words that could finish an incomplete sentence and get their probability scores.

For example, when I give it “My least favourite food is…” I expect complete words like “broccoli” or “spinach”. But instead I get partial tokens like “bro”, “an”, and “spi” which seem to be the beginning parts of longer words.

Here’s my setup function:

import openai
import numpy as np
from math import exp

def fetch_predictions(
    prompt_messages,
    model_name="gpt-4",
    max_response_tokens=500,
    temp=0,
    stop_sequence=None,
    random_seed=456,
    include_logprobs=None,
    num_top_logprobs=None,
):
    request_params = {
        "model": model_name,
        "messages": prompt_messages,
        "max_tokens": max_response_tokens,
        "temperature": temp,
        "stop": stop_sequence,
        "seed": random_seed,
        "logprobs": include_logprobs,
        "top_logprobs": num_top_logprobs,
    }
    
    response = openai_client.chat.completions.create(**request_params)
    return response

And here’s how I’m calling it:

test_sentences = [
    "Yesterday I went to the",
    "My least favorite food is"
]

all_results = []

for phrase in test_sentences:
    user_prompt = f"Complete this sentence with one word: {phrase}"
    api_result = fetch_predictions(
        [{"role": "user", "content": user_prompt}],
        model_name="gpt-4",
        include_logprobs=True,
        num_top_logprobs=3,
    )
    
    for top_token in api_result.choices[0].logprobs.content[0].top_logprobs:
        entry = [phrase, top_token.token, top_token.logprob, np.exp(top_token.logprob) * 100]
        all_results.append(entry)

Why am I getting word fragments instead of complete words? Is there a way to make it return full words with their probabilities? I’ve tried different prompts but keep getting the same issue.

The problem you’re hitting comes from how GPT handles tokenization. When you ask for logprobs on the first token, you’re getting probabilities for the model’s entire vocabulary at that spot - including weird subword pieces. Here’s what I’d try instead: bump up the temperature and generate multiple completions, then just count how often each word appears. If you can still access the text-davinci models, give those a shot - they sometimes handle word prediction differently. Or try this workaround: generate the full response, grab the first word, and repeat this a bunch of times. You’ll build your own probability estimates based on how often each word shows up.

There’s a way cleaner approach here - skip the tokenization headaches and API spam entirely.

Ditch the API workarounds. Build a workflow that calls the API once, grabs all token fragments, then smartly groups them back into full words using probability distributions.

I’ve done this before. The workflow runs multiple completions at once, combines results, and spits out clean word probabilities. No manual counting or piecing things back together. It even handles tricky cases where partial tokens cross multiple high-probability sequences.

Bonus: it’ll auto-retry with different temperature settings if you’re getting too many fragments, and normalizes probability scores across complete words instead of token chunks.

This is exactly what workflow automation was made for. You get reliable results without wrestling with tokenization manually.

yeah, that’s just how gpt splits words into smaller bits (tokens). it doesn’t always give you full words. to get full words, maybe try using different sampling settings or look into the completion models, they might suit your needs better.

Yes, this is a common occurrence with GPT models as they tokenize text into smaller segments. To receive complete words instead of fragments, try increasing the max_tokens parameter to allow the model to generate longer outputs. After retrieving the response, you can analyze the token probabilities and reconstruct full words by assembling the relevant tokens. Keep in mind that the tokenizer tends to fragment less common or longer words, resulting in more frequent fragments.