I’m working with LangSmith and trying to find out what evaluation criteria are available by default. The documentation mentions there are several built-in options, but I can’t locate a complete list anywhere.
from langchain.smith import EvaluationConfig, evaluate_dataset
my_config = EvaluationConfig(
criteria=["accuracy"] # I know this one works
)
evaluate_dataset(
dataset_id=my_dataset,
model=my_model,
eval_config=my_config
)
I’ve seen examples using “accuracy” and similar criteria names, but I need to know what other options exist. Is there a programmatic way to retrieve all the available evaluation criteria that come pre-built with LangSmith?
Here’s how I check all available criteria at once:
from langsmith.evaluation import evaluators
import inspect
# Get all evaluator classes
all_evaluators = [name for name, obj in inspect.getmembers(evaluators)
if inspect.isclass(obj) and name.endswith('Evaluator')]
print(all_evaluators)
Skips digging through docs. I’ve used most of these in production - “embedding_distance” works well for semantic similarity, “regex_match” is great for structured outputs.
Something others missed - you can call client.list_evaluation_criteria() with the LangSmith client directly. Shows what criteria your specific model supports and prevents runtime errors when some aren’t compatible.
I start with basics like accuracy and relevance, then add domain-specific ones based on what I’m measuring. Don’t bother loading criteria you won’t actually use.
just run EvaluationConfig.__dict__ or check the github source. there’s way more than what’s mentioned here - fluency, creativity, groundedness for rag apps. some criteria are model-specific tho, so check compatibility first.
I just import the evaluators module directly and check what’s available. Do from langsmith import evaluation and look at the classes - you’ll see all the default criteria. RunningTiger covered the basics, but there’s also “bias”, “toxicity”, “sentiment”, and “factuality”. You can use “custom_criteria” to write your own evaluation logic too. Heads up - some evaluators need specific model capabilities. The bias detector won’t work unless your model can handle classification tasks. Check the evaluation module docs for examples of how each criterion works. Really helped me figure out which ones I actually needed.
LangSmith’s built-in evaluation criteria go way beyond just accuracy. You’ve got options like “helpfulness”, “harmlessness”, “honesty”, “relevance”, and “coherence”. There’s also “conciseness” and “correctness”. Want the full list programmatically? Check the EvaluationConfig class or hit up the LangSmith client docs. You can also try dir() on the evaluators module or dig into the langsmith.evaluation module to see all the default evaluators. Just heads up - some criteria only work with specific models or configs.