I’m working with LangChain and FAISS vector store but I’m having trouble configuring the similarity threshold. Right now I can only set the k parameter to limit the number of results returned, but what I really need is to filter results based on their similarity score. I want to exclude documents that don’t meet a minimum similarity threshold. Does anyone know how to configure this properly?
from langchain.document_loaders import PyPDFLoader
from langchain.vectorstores import FAISS
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
def create_qa_chain(vector_db):
model = ChatOpenAI(temperature=0, model_name='gpt-3.5-turbo')
retrieval_chain = ConversationalRetrievalChain.from_llm(
llm=model,
retriever=vector_db.as_retriever(search_kwargs={'k': 3}),
return_source_documents=True,
verbose=True
)
return retrieval_chain
doc_loader = PyPDFLoader("document.pdf")
doc_pages = doc_loader.load_and_split()
vector_store = FAISS.from_documents(doc_pages, OpenAIEmbeddings())
conversation_history = []
qa_chain = create_qa_chain(vector_store)
user_query = "Tell me about roses"
response = qa_chain({"question": user_query, "chat_history": conversation_history})
Heads up - similarity thresholds with FAISS can bite you. Score normalization gets wonky depending on how you built your vector store.
Hit this building a document search system. Same query, same threshold, totally different results after rebuilding the index with new docs. FAISS distance calculations drift based on your vector distribution.
Fixed it by adding score validation after retrieval:
def validate_retrieval_scores(docs_with_scores, min_threshold=0.7):
if not docs_with_scores:
return []
# Check if scores look reasonable
scores = [score for _, score in docs_with_scores]
if max(scores) < 0.3: # Probably using distance instead of similarity
print("Warning: Scores look like distances, not similarities")
return [doc for doc, score in docs_with_scores if score >= min_threshold]
Test your threshold with real queries before launching. My “perfect” testing threshold was way too strict - users couldn’t find obvious matches.
If you’re updating your vector store regularly, keep benchmark queries handy. Use them to check if your thresholds still work after each update.
The similarity_score_threshold works well, but try max_marginal_relevance instead. It balances similarity with diversity and gives you better control over results.
This is great for documents with repetitive content. MMR stops you from getting multiple nearly-identical chunks while keeping relevance high.
One thing to watch: similarity scores change a lot between embedding models. When I switched from OpenAI to sentence-transformers, I had to completely recalibrate my thresholds. Always test with real queries from your domain or you’ll get weird gaps in retrieval.
Had the exact same problem last month. You can use similarity_score_threshold with FAISS by calling the vectorstore’s similarity_search_with_score method directly before it hits the retriever.
def filtered_retriever_func(query):
docs_with_scores = vector_store.similarity_search_with_score(query, k=10)
filtered_docs = [doc for doc, score in docs_with_scores if score >= 0.75]
return filtered_docs
Wrap this in a custom retriever class. You get full control over filtering and can add complex rules like different thresholds for different document types.
Watch out though - FAISS sometimes returns distance scores instead of similarity scores. Lower distance = higher similarity, so you’d filter with score <= threshold. Test with a query first to see what values you’re actually getting.
Honestly though, managing thresholds manually gets messy fast. You’re constantly tweaking scores for different document types, handling edge cases when nothing meets the threshold, and adjusting based on user feedback.
I’ve hit this problem multiple times. Now I just automate the whole thing with Latenode. Built a workflow that monitors query performance, tracks which similarity scores actually work, and auto-adjusts thresholds based on document types and user satisfaction.
The workflow pulls analytics from my LangChain app, calculates optimal thresholds from historical data, and updates the retriever config automatically. It even handles fallbacks when similarity scores are too low everywhere.
Saves me hours of manual tuning and works way better than static thresholds. You can set up something similar pretty quick.
Yeah, you need similarity_score_threshold in your search kwargs instead of just k. FAISS supports this but the docs don’t make it obvious.
Here’s what worked for me:
retriever = vector_db.as_retriever(
search_type="similarity_score_threshold",
search_kwargs={
"score_threshold": 0.8, # Only return docs with similarity >= 0.8
"k": 10 # Max number to consider
}
)
Threshold value depends on your embedding model. With OpenAI embeddings, I start around 0.7-0.8. Lower = less strict filtering.
Heads up - if no docs meet your threshold, you get empty results. Learned this during a demo when my system suddenly stopped returning answers.
Consider adding a fallback for regular similarity search if threshold search returns nothing:
if not response['source_documents']:
# Fallback to regular similarity search
fallback_retriever = vector_db.as_retriever(search_kwargs={'k': 3})
This setup’s been solid in production where answer quality beats always having an answer.
You could also use score_threshold with the similarity_search_with_relevance_scores() method. It returns normalized scores (0-1) that are way easier to handle than raw FAISS distances. Just swap out your retriever with a custom function that calls this method and filters the results before feeding them to your chain.