How to apply filtering in Langchain retrievers for combined searches

I’m working with multiple retrievers in Langchain and need help with filtering. Can someone tell me which retrievers actually support filter parameters?

data = [Document(page_content='Manchester United is the best club ever.', metadata={"category": "football"}),
        Document(page_content='United scored 3 goals in the match yesterday', metadata={"category": "football"}),
        Document(page_content='Random content here for testing.', metadata={"category": "misc"})]

# using any embedding model
vector_db = FAISS.from_documents(data, embedding_model)
query = "Which team do I like most?"
bm25_ret = BM25Retriever.from_documents(data)
vector_ret = vector_db.as_retriever(search_kwargs={'filter': {"category": "football"}, 'k': 3, 'fetch_k': 6})
combined = EnsembleRetriever(retrievers=[bm25_ret, vector_ret], weights=[0.4, 0.6])
result = combined.get_relevant_documents(query)

The problem is that when I combine a FAISS retriever with BM25 using EnsembleRetriever, the filtering doesn’t work properly on the BM25 side. Is there a way to ensure both retrievers respect the same filter conditions?

BM25Retriever doesn’t support filter parameters like FAISS does. That’s why filtering breaks when you combine them in EnsembleRetriever.

You could filter documents before creating BM25Retriever, but it’s messy with dynamic filtering. You’d have to recreate retrievers for each filter condition.

I’ve hit this exact problem. Instead of fighting Langchain’s limitations, I built a workflow in Latenode that handles it cleanly.

The workflow takes your query and filters, processes each retriever separately with proper filtering, then merges results with weighted scoring. No hacking around retriever limits.

Latenode’s visual builder makes it easy to add retrievers or tweak scoring without code changes. You can expose it as an API too.

Check it out: https://latenode.com

yeah, same issue here. quick fix: pre-filter your docs by category before you initialize bm25. it’s not pretty but it works. or try wrapping it with MultiQueryRetriever - handles filtering way better across different retriever types.

Had this exact headache on a project last year. The problem is BM25Retriever and vector retrievers handle filtering completely differently.

BM25 works with a static document collection and can’t do runtime filtering. Vector stores like FAISS filter during search. When you combine them in EnsembleRetriever, only the FAISS side respects your filter.

Here’s what worked for me: create filtered BM25 retrievers upfront for each category you need.

football_docs = [doc for doc in data if doc.metadata.get("category") == "football"]
bm25_football = BM25Retriever.from_documents(football_docs)

vector_ret = vector_db.as_retriever(search_kwargs={'filter': {"category": "football"}, 'k': 3})
combined = EnsembleRetriever(retrievers=[bm25_football, vector_ret], weights=[0.4, 0.6])

Yeah, you need separate BM25 instances per filter condition. Not elegant but it works.

If you want to see how ensemble retrievers work and get better at combining different retrieval methods, this tutorial breaks it down well:

For production with lots of filter combinations, I’d honestly just use retrievers that both support native filtering. Saves you from this whole mess.

Hit this exact problem last month. BM25Retriever works with a fixed document set and can’t filter natively, but FAISS can filter during retrieval. I solved it by building a custom retriever wrapper that filters documents by metadata before BM25 sees them. You’ll need to filter your docs first, then create separate BM25 instances for different filter conditions. You could also post-process the combined results, but that messes with ensemble scoring since BM25 still ranks against the full unfiltered set. The rankings get weird because BM25 calculates relevance scores against your entire corpus, not just the filtered subset. If you’re filtering a lot, just switch to retrievers that both support native filtering - Chroma or Pinecone instead of BM25/FAISS.