I’m working with LangSmith and need to find out what default evaluation criteria are available. The documentation mentions there are multiple built-in options but doesn’t show a complete list anywhere.
Here’s what I’m currently using to test my model outputs:
This works fine for “accuracy” but I want to know what other options I can use instead. Is there a way to programmatically get all the available criteria names? I’ve checked the official docs but can’t find a comprehensive list.
Just use vars(EvaluationConfig.DefaultCriteria) instead of dir(). It’s cleaner and shows the actual callable criteria without all the dunder methods. Worked for me when I needed to quickly see what’s available without digging through docs.
Check the LangSmith Python client docs on GitHub releases - they’ve got a changelog showing when new default criteria get added. Found this while debugging version compatibility issues. Better approach: use the client’s introspection features. The smith_client.get_evaluation_templates() method gives you metadata about available evaluators, including defaults. Way more reliable than parsing class attributes since it shows what your specific instance actually supports. From running evals on production datasets: core criteria like “accuracy”, “relevance”, “coherence”, and “conciseness” stay pretty stable across versions. Safety ones like “harmfulness” and “controversiality” change more often as the underlying models improve. Always test on a small dataset first - some criteria have weird input format requirements that aren’t obvious from the names.
I ran into the same issue on a project. The docs don’t list all the default evaluation criteria, but there’s a workaround. Just run dir(EvaluationConfig.DefaultCriteria) in Python to see what’s available. You’ll usually find stuff like “accuracy”, “relevance”, and “coherence”. Also check the LangSmith UI when you’re setting up evaluations manually - it shows the full list and it’s way more reliable than the docs.
I just check the source code directly - the docs are incomplete anyway. Go to the LangSmith GitHub repo and browse the evaluation module files. You’ll find default criteria like “helpfulness”, “harmfulness”, “controversiality”, “misogyny”, “criminality”, and “insensitivity” plus the usual accuracy and relevance stuff. You can also use Python’s inspect module on the DefaultCriteria class for more details. I found several criteria this way that weren’t documented anywhere, saved me tons of debugging time when I was setting up evals for my chatbot.
Had this exact problem last month building evaluations for our RAG system. Skip digging through source code - there’s an easier way.
Try the LangSmith client’s list_evaluation_criteria() method if you’ve got access. But honestly, I just made a quick test script:
from langchain.smith import EvaluationConfig
# This shows you everything
for attr in dir(EvaluationConfig.DefaultCriteria):
if not attr.startswith('_'):
try:
criteria = getattr(EvaluationConfig.DefaultCriteria, attr)
print(f"Available: {attr}")
except:
pass
You’ll get the usual stuff - accuracy, relevance, coherence - plus safety checks like harmfulness and toxicity detection. Exact list varies by your LangSmith version.
Pro tip: the LangSmith web interface has the most current list when you create evaluation runs manually. I screenshot it and stick it in my code comments since the API docs are always behind.