How can I retrieve similarity scores from vector search in RAG implementation using LangChain and OpenAI?

Alice45 · August 23, 2025, 3:00am

I’m building a retrieval augmented generation system with LangChain, OpenAI models, and a web interface. My setup uses the “map_rerank” approach for document processing.

I can successfully get answers and source documents, but I’m struggling to capture the similarity scores from my vector search. The scores show up in my console logs but I can’t figure out how to access them programmatically to display in my UI.

Here’s my setup code:

@on_chat_start
def setup():
    model = AzureChatOpenAI(
        deployment_name=config.MODEL_DEPLOYMENT,
        model_name=config.MODEL_NAME,
        openai_api_base=config.ENDPOINT_URL,
        openai_api_version=config.API_VERSION,
        openai_api_key=config.API_KEY,
        temperature=0.3,
        streaming=True
    )

    embed_model = OpenAIEmbeddings(
        deployment=config.EMBEDDING_DEPLOYMENT,
        model=config.EMBEDDING_MODEL,
        openai_api_base=config.ENDPOINT_URL,
        openai_api_key=config.API_KEY,
        chunk_size=1000
    )

    vector_store = FAISS.load_local(
        config.VECTOR_DB_PATH,
        embed_model
    )

    document_retriever = vector_store.as_retriever(
        search_type="similarity_score_threshold",
        search_kwargs={"score_threshold": 0.4, "k": 4}
    )

    query_chain = LLMChain(
        llm=model,
        prompt=QUESTION_PROMPT
    )

    answer_chain = load_qa_with_sources_chain(
        model,
        chain_type="map_rerank",
        return_intermediate_steps=False
    )

    chat_memory = ConversationBufferMemory(
        llm=model,
        memory_key="history",
        return_messages=True,
        input_key="query",
        output_key="response"
    )

    rag_chain = ConversationalRetrievalChain(
        retriever=document_retriever,
        question_generator=query_chain,
        combine_docs_chain=answer_chain,
        return_source_documents=True,
        memory=chat_memory
    )

    cl.user_session.set("rag_chain", rag_chain)

And here’s how I process messages:

@on_message
async def handle_message(user_input: str):
    history = []
    
    rag_chain = cl.user_session.get("rag_chain")
    
    result = rag_chain({"query": user_input, "history": history})
    
    doc_sources = [doc.metadata.get("source") for doc in result["source_documents"]]
    
    await cl.Message(content=f'Response: {result["response"]}, Sources: {set(doc_sources)}').send()

I need to extract the actual similarity scores that are being calculated during the search process. The scores appear in terminal output but I want to capture them and show them to users. How can I access these score values from the retrieval chain?

marcoMingle · August 30, 2025, 3:59am

Skip the retriever entirely. I had this exact issue with a RAG system last month - the retriever abstraction hides the scores you need.

Build a custom retrieval function:

def get_docs_with_scores(vector_store, query, k=4, threshold=0.4):
    docs_and_scores = vector_store.similarity_search_with_score(query, k=k)
    filtered = [(doc, score) for doc, score in docs_and_scores if score >= threshold]
    return filtered

Then modify your message handler:

@on_message
async def handle_message(user_input: str):
    vector_store = cl.user_session.get("vector_store")  # Store this in setup
    
    docs_with_scores = get_docs_with_scores(vector_store, user_input)
    docs = [doc for doc, score in docs_with_scores]
    scores = [score for doc, score in docs_with_scores]
    
    # Use docs directly with your LLM chain
    response = your_llm_chain.run(input_documents=docs, question=user_input)
    
    # Now you have both response and scores
    await cl.Message(content=f'Response: {response}, Scores: {scores}').send()

You get full control over retrieval and score access. Way cleaner than extracting scores from nested chain abstractions.

For better automation of complex RAG workflows, check out Latenode. It handles vector search and score extraction seamlessly: https://latenode.com

sofiag · August 29, 2025, 8:39pm

Had this exact same issue with FAISS. ConversationalRetrievalChain doesn’t expose similarity scores by default, which sucks. Here’s what worked for me - create a custom retriever that inherits from VectorStoreRetriever and override the get_relevant_documents method. Just store the scores in a class variable: python class ScoreRetriever(VectorStoreRetriever): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self.last_scores = [] def get_relevant_documents(self, query): docs_and_scores = self.vectorstore.similarity_search_with_score(query, k=self.search_kwargs.get('k', 4)) self.last_scores = [score for _, score in docs_and_scores] return [doc for doc, _ in docs_and_scores] Swap out your document_retriever with this custom class. Then you can grab the scores with document_retriever.last_scores after each query. Keeps your existing chain intact but gives you the similarity scores for your UI.

avamtz · August 26, 2025, 9:53am

Use similarity_search_with_score() directly on your vector store instead of the retriever. Try docs_and_scores = vector_store.similarity_search_with_score(query, k=4) - you’ll get scores from the returned tuples. You’ll probably need to tweak your chain setup tho.

sofiap · August 26, 2025, 1:03am

The map_rerank chain doesn’t preserve retrieval metadata - ran into this exact issue. Using return_intermediate_steps=True helps but won’t give you direct score access. What worked for me: modify the retrieval step before it hits the chain. Store your vector store separately in the session during setup, then intercept the query in your message handler: ```python
@on_message
async def handle_message(user_input: str):
rag_chain = cl.user_session.get(“rag_chain”)
vector_store = cl.user_session.get(“vector_store”) # Add this to setup

# Get scores before chain execution
scored_docs = vector_store.similarity_search_with_score(
    user_input, 
    k=4, 
    score_threshold=0.4
)

result = rag_chain({"query": user_input, "history": []})

scores = [score for _, score in scored_docs]
doc_sources = [doc.metadata.get("source") for doc in result["source_documents"]]

await cl.Message(content=f'Response: {result["response"]}, Sources: {set(doc_sources)}, Scores: {scores}').send()

This keeps your existing chain setup while grabbing the similarity scores you need for the UI.