Building a hybrid chatbot with Langchain that combines RAG and general AI knowledge

I’m working on developing a social media bot that analyzes user posts and creates similar content based on their style. I want to combine RAG functionality with the general knowledge that LLMs already have built in. Right now my bot can answer questions about stored posts from my vector database, but when I ask basic questions like “How tall is the Eiffel Tower?” it says it doesn’t know the answer.

from dotenv import load_dotenv
import streamlit as st
from langchain_openai import ChatOpenAI
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.schema import HumanMessage, AIMessage

load_dotenv()

# Setup OpenAI model
ai_model = ChatOpenAI(
    model="gpt-3.5-turbo",
    temperature=0.7
)

if 'current_user' not in st.session_state:
    st.session_state['current_user'] = ''

st.text_input("Please enter your username:", key='current_user')

if not st.session_state['current_user']:
    st.error("Username is required to continue")
    st.stop()

st.info(f"Hello {st.session_state['current_user']}!")

# Setup embeddings
embedding_model = OpenAIEmbeddings()

# Vector database configuration
db_directory = "vector_storage"
user_collection = "social_posts"

# Initialize vector database
post_vectorstore = Chroma(
    embedding_function=embedding_model,
    persist_directory=db_directory,
    collection_name=user_collection
)

def create_user_retriever(username):
    user_retriever = post_vectorstore.as_retriever(
        search_type="similarity",
        search_kwargs={
            "k": 3,
            "filter": {"username": {"$eq": username}}
        }
    )
    return user_retriever

current_retriever = create_user_retriever(st.session_state['current_user'])

if 'conversation_memory' not in st.session_state:
    st.session_state.conversation_memory = ConversationBufferMemory(
        memory_key="history",
        return_messages=True,
        output_key='answer'
    )

bot_chain = ConversationalRetrievalChain.from_llm(
    llm=ai_model,
    retriever=current_retriever,
    memory=st.session_state.conversation_memory,
    return_source_documents=True
)

st.title("Social Media Content Bot")

for msg in st.session_state.conversation_memory.chat_memory.messages:
    if isinstance(msg, HumanMessage):
        with st.chat_message("user"):
            st.write(msg.content)
    elif isinstance(msg, AIMessage):
        with st.chat_message("bot"):
            st.write(msg.content)

if user_query := st.chat_input("Ask me anything..."):
    with st.chat_message("user"):
        st.write(user_query)

    bot_response = bot_chain({"question": user_query})
    reply_text = bot_response.get('answer', "Sorry, I couldn't process that.")
    retrieved_docs = bot_response.get('source_documents', [])

    with st.chat_message("bot"):
        st.write(reply_text)

    if retrieved_docs:
        st.subheader("Related Posts Found:")
        for i, document in enumerate(retrieved_docs, 1):
            with st.expander(f"Reference {i}"):
                st.write(f"**Text:** {document.page_content}")
                st.write(f"**Details:** {document.metadata}")

What would be the best way to make this work for both types of questions? I’m currently using ConversationalRetrievalChain but maybe there’s a better approach.

ConversationalRetrievalChain forces everything through the retriever first - that’s your problem. I ran into this exact issue building a document Q&A system. Here’s what fixed it for me: add a routing layer before the chain kicks in. Build a simple classifier that figures out if someone’s asking about user posts or just general stuff. You can check for domain keywords or use a lightweight model to catch the intent. Route general questions straight to the LLM and skip the retriever entirely. Only fire up RAG for post-specific queries. Best of both worlds - no more retriever blocking your general knowledge questions.

I built something similar last year for a customer support bot. Skip the routing - just modify your prompt template to tell the model it can use general knowledge when retrieval comes up empty. Update your ConversationalRetrievalChain’s system prompt with something like “Use the provided context if it’s relevant, otherwise answer from what you know.” Customize the combine_docs_chain parameter with your own prompt template. The model will naturally fall back to training data when retrieved docs don’t help. Way cleaner than separate logic paths and keeps conversations flowing smoothly.

you need to add a fallback to your chain for when the retriever comes up empty. check if source_documents is empty, then hit the ai model directly with the question. try if not retrieved_docs: fallback_response = ai_model.invoke([HumanMessage(content=user_query)]) - works great for general knowledge questions.

Your ConversationalRetrievalChain works exactly as designed - it only uses retrieved documents. You’re just forcing every question through RAG when you shouldn’t.

I’ve hit this same wall multiple times. Cleanest fix? Use LangGraph to build conditional routing instead of hacking the existing chain.

Make two separate chains: keep your ConversationalRetrievalChain for post analysis, add a basic ConversationChain for general stuff. Then create a decision node that routes based on the query.

This works well:

def should_use_rag(query):
    post_keywords = ['post', 'content', 'style', 'wrote', 'published']
    return any(keyword in query.lower() for keyword in post_keywords)

if should_use_rag(user_query):
    response = bot_chain({"question": user_query})
else:
    response = general_chain.run(user_query)

This keeps your existing code mostly untouched while giving you proper hybrid functionality. You can make the routing logic fancier later, but keyword matching handles most cases surprisingly well.

Your current chain architecture is solid for RAG - just route around it for general knowledge questions.