I built a simple LangChain application that processes PDF documents using OpenAI’s language model. The system works great for answering questions about the PDF content, but I noticed something weird.
import PyPDF2
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.vectorstores import FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import OpenAI
# Load and process document
pdf_file = PyPDF2.PdfReader('./research_paper.pdf')
document_content = ''
for page_num, page in enumerate(pdf_file.pages):
page_text = page.extract_text()
if page_text:
document_content += page_text
# Split text into chunks
splitter = RecursiveCharacterTextSplitter(
chunk_size=800,
chunk_overlap=150,
separators=["\n\n", "\n", " "]
)
text_chunks = splitter.split_text(document_content)
# Create vector store
vector_embeddings = OpenAIEmbeddings()
vector_db = FAISS.from_texts(text_chunks, vector_embeddings)
# Setup QA chain
qa_chain = load_qa_chain(OpenAI(temperature=0), chain_type="stuff")
# Ask questions
user_query = "What are the main findings?"
relevant_docs = vector_db.similarity_search(user_query)
response = qa_chain.run(input_documents=relevant_docs, question=user_query)
print(response)
The issue is that when I ask basic questions like “what is 5+3” or general knowledge questions, the model acts like it doesn’t know anything. It seems like the model only has access to the PDF content now. How can I keep the model’s original training knowledge while still being able to query my documents?
I encountered a similar problem while developing a document query tool. The issue arises because your QA chain restricts the context to just the document chunks, causing the model to overlook its extensive pre-training knowledge. To resolve this, consider adjusting your implementation so that it can access both the PDF content and its general knowledge. One strategy is to determine if a question pertains to the document or general knowledge. Alternatively, you might enhance your prompts to instruct the model to utilize both sources for answering. For instance, you could phrase prompts as follows: “Refer to these documents AND your pre-trained insights when responding.” Additionally, establishing a fallback mechanism could be beneficial; if the similarity search yields no relevant results, you could then direct the query to the language model directly. This approach has proven effective for me.
I’ve hit this exact problem on multiple projects. Your QA chain creates tunnel vision - it only sees retrieved chunks and ignores what the model already knows.
Skip the routing logic and prompt hacks. I automated this with Latenode - built a workflow that decides whether to use RAG or go direct based on the query.
It handles query classification, similarity scoring, and response routing without coding. Someone asks “what’s 5+3” and it skips vector search completely. They ask about your docs and it runs through RAG.
You can also set up fallback chains. RAG gives a weak answer? It automatically tries hybrid - combines doc context with general knowledge.
Way cleaner than writing custom routing for every project. The workflow makes all the decisions.
Same exact frustration here when I started with RAG systems. Your setup basically blinds the model to everything except those retrieved chunks. Ask “what’s 5+3” and similarity search still pulls random PDF chunks - now the model’s trying to find math answers in research papers. Confusing mess. I fixed this with a hybrid approach. Changed my prompt template to explicitly tell the model it can use both the provided context AND its general knowledge. Skip feeding raw chunks and structure it like: “Use this context if it’s relevant, but also use your general knowledge for the best answer.” The model won’t feel stuck using only document content when it’s clearly useless. Also helped to adjust similarity threshold. If your search returns chunks with terrible relevance scores, skip RAG entirely for that query. I set minimum similarity at 0.7 and route general questions straight to the base model when nothing relevant comes up.
This happens because your QA chain creates a closed system - the model can only use the retrieved document chunks. When you ask “what is 5+3”, the similarity search still grabs some PDF chunks (with awful relevance scores), but the model thinks it has to work only with that context.
I fixed this by tweaking the chain config instead of adding routing logic. Use a custom prompt template that tells the model it’s okay to go beyond the provided context. Something like “Answer based on this context, but if it doesn’t contain relevant info, use your general knowledge instead.”
Also check your similarity search threshold settings. Most vector stores let you filter out results below a certain score. If nothing relevant gets retrieved, your chain defaults to general knowledge mode automatically. Way simpler than building separate pipelines for different query types.
Yeah this is a classic RAG gotcha that bit me hard on a project last year.
Your chain only sees retrieved chunks - nothing else. Ask “what is 5+3” and similarity search grabs random PDF chunks about research papers, then the model tries answering math with irrelevant content.
I fixed it with a relevance check before hitting the vector store:
# Quick relevance check
relevance_prompt = f"Is this question about the document content? Question: {user_query}"
relevance_check = OpenAI(temperature=0).predict(relevance_prompt)
if "no" in relevance_check.lower():
# Direct to base model
response = OpenAI(temperature=0).predict(user_query)
else:
# Use your existing RAG pipeline
relevant_docs = vector_db.similarity_search(user_query)
response = qa_chain.run(input_documents=relevant_docs, question=user_query)
You could also try a conversational chain that keeps context but allows general knowledge. RetrievalQA with return_source_documents=True helps too - you can check what chunks got pulled and see if they make sense.
Bottom line: you need routing logic to decide RAG vs plain LLM calls.
Yeah, I ran into this too. Your QA chain is basically forcing the model to only use those retrieved chunks - it’s ignoring everything else it knows. Try switching from “stuff” to “map_reduce” chain type, that sometimes fixes it. Or just update your prompt template to tell the model it’s okay to use general knowledge when the docs don’t have what you need.