I’m trying to run evaluations on a dataset using LangSmith but keep running into problems. Every time I try to evaluate my dataset, I get this error message:
ValueError: Evaluation with the <class ‘langchain.evaluation.qa.eval_chain.QAEvalChain’> requires a language model to function. Failed to create the default ‘gpt-4’ model. Please manually provide an evaluation LLM or check your openai credentials.
Here’s my evaluation code:
evaluate_dataset(
langsmith_client=my_client,
dataset_id="Test Dataset",
model_factory=my_model,
eval_settings=config_settings,
)
I’ve tried different API keys including personal OpenAI keys and Azure OpenAI keys from working projects. Before running the evaluation, I always test that the connection works:
# Initialize client and verify connection
my_client = Client()
my_model = ChatOpenAI(openai_api_key='(actual key here)',)
my_model.predict("Test message!")
The test always passes but the evaluation still fails. What could be causing this issue?
This is a common LangSmith evaluation issue. QAEvalChain creates its own internal language model for evaluation - it’s separate from the model you’re testing. Your ChatOpenAI model works fine, but the evaluator can’t access your credentials when it tries to spin up gpt-4 internally.
I hit this exact error when I started using LangSmith for dataset evals. You need to set your OpenAI API key as an environment variable before running the evaluation, not just in your model setup. Add os.environ['OPENAI_API_KEY'] = 'your-key-here' at the top of your script or export it in your shell. That way when QAEvalChain creates its internal gpt-4 instance, it’ll find the credentials automatically.
sounds like langsmith’s trying to create its own gpt-4 instance instead of using yours. maybe try passing the llm directly in eval_settings or check if there’s an evaluator param that takes your model. i had that same issue last month.
The error occurs because QAEvalChain attempts to create its own GPT-4 instance for evaluation, which is separate from your ChatOpenAI model. Although your connection test is successful, the evaluator cannot access your credentials. I encountered a similar issue with custom evaluation chains. To resolve this, specify the evaluator to use your working model instead of allowing it to create a default one:
from langchain.evaluation.qa import QAEvalChain
custom_evaluator = QAEvalChain.from_llm(llm=my_model)
Make sure to reference this evaluator in your evaluation setup to utilize your authenticated model.