Dataset Evaluation Error in LangSmith - Language Model Issue

I’m trying to run evaluations on a dataset using LangSmith but keep running into problems. Every time I try to evaluate my dataset, I get this error message:

ValueError: Evaluation with the <class ‘langchain.evaluation.qa.eval_chain.QAEvalChain’> requires a language model to function. Failed to create the default ‘gpt-4’ model. Please manually provide an evaluation LLM or check your openai credentials.

Here’s my evaluation code:

evaluate_dataset(
    langsmith_client=my_client,
    dataset_id="Test Dataset",
    model_factory=my_model,
    eval_settings=config_settings,
)

I’ve tried different API keys including personal OpenAI keys and Azure OpenAI keys from working projects. Before running the evaluation, I always test that the connection works:

# Initialize client and verify connection
my_client = Client()

my_model = ChatOpenAI(openai_api_key='(actual key here)',)
my_model.predict("Test message!")

The test always passes but the evaluation still fails. What could be causing this issue?

This is a common LangSmith evaluation issue. QAEvalChain creates its own internal language model for evaluation - it’s separate from the model you’re testing. Your ChatOpenAI model works fine, but the evaluator can’t access your credentials when it tries to spin up gpt-4 internally.

I hit this exact error when I started using LangSmith for dataset evals. You need to set your OpenAI API key as an environment variable before running the evaluation, not just in your model setup. Add os.environ['OPENAI_API_KEY'] = 'your-key-here' at the top of your script or export it in your shell. That way when QAEvalChain creates its internal gpt-4 instance, it’ll find the credentials automatically.

sounds like langsmith’s trying to create its own gpt-4 instance instead of using yours. maybe try passing the llm directly in eval_settings or check if there’s an evaluator param that takes your model. i had that same issue last month.

The error occurs because QAEvalChain attempts to create its own GPT-4 instance for evaluation, which is separate from your ChatOpenAI model. Although your connection test is successful, the evaluator cannot access your credentials. I encountered a similar issue with custom evaluation chains. To resolve this, specify the evaluator to use your working model instead of allowing it to create a default one:

from langchain.evaluation.qa import QAEvalChain

custom_evaluator = QAEvalChain.from_llm(llm=my_model)

Make sure to reference this evaluator in your evaluation setup to utilize your authenticated model.