I’m new to working with language models and the OpenAI API. I want to predict the most likely words that could finish an incomplete sentence and get their probability scores.
For example, when I give it “My least favourite food is…” I expect complete words like “broccoli” or “spinach”. But instead I get partial tokens like “bro”, “an”, and “spi” which seem to be the beginning parts of longer words.
Here’s my setup function:
import openai
import numpy as np
from math import exp
def fetch_predictions(
prompt_messages,
model_name="gpt-4",
max_response_tokens=500,
temp=0,
stop_sequence=None,
random_seed=456,
include_logprobs=None,
num_top_logprobs=None,
):
request_params = {
"model": model_name,
"messages": prompt_messages,
"max_tokens": max_response_tokens,
"temperature": temp,
"stop": stop_sequence,
"seed": random_seed,
"logprobs": include_logprobs,
"top_logprobs": num_top_logprobs,
}
response = openai_client.chat.completions.create(**request_params)
return response
And here’s how I’m calling it:
test_sentences = [
"Yesterday I went to the",
"My least favorite food is"
]
all_results = []
for phrase in test_sentences:
user_prompt = f"Complete this sentence with one word: {phrase}"
api_result = fetch_predictions(
[{"role": "user", "content": user_prompt}],
model_name="gpt-4",
include_logprobs=True,
num_top_logprobs=3,
)
for top_token in api_result.choices[0].logprobs.content[0].top_logprobs:
entry = [phrase, top_token.token, top_token.logprob, np.exp(top_token.logprob) * 100]
all_results.append(entry)
Why am I getting word fragments instead of complete words? Is there a way to make it return full words with their probabilities? I’ve tried different prompts but keep getting the same issue.