How to Access Similarity Scores from Documents Retrieved Using SelfQueryRetriever in Langchain

Isaac_Cosmos · August 7, 2025, 12:35am

I’m working with a SelfQueryRetriever setup and need help accessing the similarity scores for the documents it returns.

Here’s my current retriever configuration:

query_retriever = SelfQueryRetriever.from_llm(
    llm_model = my_llm,
    vector_db = my_vectorstore,
    document_contents = content_description,
    metadata_field_info = field_info,
    enable_limit=True, 
    search_type = "similarity_score_threshold",
    search_kwargs={"score_threshold": 0.75, "k": 10},
    verbose=True
)

The retrieval works fine and I get back Document objects containing page_content and metadata. However, I need to see the actual similarity scores for each retrieved document. Is there a way to extract or display these scores? The search type is set to use similarity thresholds but I can’t figure out how to access the actual score values.

avaw · August 16, 2025, 6:16am

Had this exact problem last year building a document search feature. SelfQueryRetriever won’t pass through scores - it’s just how it’s built.

Here’s what fixed it for me - skip the retriever and hit the vectorstore directly:

# Instead of query_retriever.get_relevant_documents(query)
results = my_vectorstore.similarity_search_with_score(
    query, 
    k=10, 
    score_threshold=0.75
)

for doc, score in results:
    print(f"Score: {score}, Content: {doc.page_content[:100]}...")

Downside is you lose the self querying. If you need both, I built a custom retriever class that inherits from SelfQueryRetriever and overrides the _get_relevant_documents method to return scores.

You could also monkey patch the retriever after creating it, but that’s messy for production. Custom class is way cleaner if you’ll use this often.

Pete_Magic · August 14, 2025, 9:54am

Yeah, this is a pretty common issue with SelfQueryRetriever. Here’s a workaround that’s worked well for me: create a wrapper around your vectorstore that catches the search calls before they hit SelfQueryRetriever. Your wrapper intercepts the search method and stashes the scores somewhere - could be a class attribute or external storage. When SelfQueryRetriever calls the vectorstore, your wrapper grabs both documents and scores through this side channel. After retrieval, just match the returned docs with their stored scores using content or metadata as keys. It’s not the prettiest fix, but it works without having to restructure your whole codebase.

Ethan_19Chess · August 12, 2025, 3:55pm

SelfQueryRetriever wraps the vectorstore and doesn’t expose similarity scores through its standard methods. Here’s how to get them: first retrieve documents with SelfQueryRetriever, then query the vectorstore again using similarity_search_with_score() with each document’s content. Alternatively, you could skip SelfQueryRetriever entirely - implement your own filtering logic and use the vectorstore’s native similarity search methods that return scores. This approach provides more control over retrieval and direct access to the similarity metrics.

emmat83 · August 12, 2025, 10:47am

try using retriever.vectorstore.similarity_search_with_score() in place of get_relevant_documents(). it returns (document, score) tuples. if that still doesn’t help, check the vectorstore directly, selfqueryretriever might not reveal the scores.