Chat history not functioning in my RAG implementation

MiaDragon42 · August 10, 2025, 12:28pm

I’m building a RAG application that can chat with documents stored in Pinecone, but I’m having trouble with conversation memory. The chat history gets saved to a list correctly, but when I ask about previous conversations, the model acts like it’s the first question ever asked.

import streamlit as st
from langchain_community.vectorstores.pinecone import Pinecone as LCPinecone
from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_openai.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from pinecone import Pinecone
from langchain.chains import ConversationalRetrievalChain

VECTOR_INDEX = "my-docs-index"
QUERY_PREFIX = "User Question: "
DOCS_TO_FETCH = 3
LLM_TEMP = 0.2
MEMORY_LIMIT = 10

def setup_pinecone(key: str) -> Pinecone:
    return Pinecone(api_key=key)

def setup_session():
    if "conversation_log" not in st.session_state:
        st.session_state.conversation_log = []

def process_question(query: str, openai_key: str):
    try:
        setup_session()
        
        formatted_query = QUERY_PREFIX + query
        pinecone_client = setup_pinecone(st.secrets["pinecone_key"])
        
        embeddings = OpenAIEmbeddings(model="text-embedding-3-large", openai_api_key=openai_key)
        vector_index = pinecone_client.Index(VECTOR_INDEX)
        
        memory = ConversationBufferMemory(memory_key="conversation_log", return_messages=True)
        vector_db = LCPinecone(index=vector_index, embedding=embeddings, text_key="content")
        language_model = OpenAI(temperature=LLM_TEMP, openai_api_key=openai_key)
        doc_retriever = vector_db.as_retriever(search_type="similarity", search_kwargs={"k": DOCS_TO_FETCH})
        
        chat_chain = ConversationalRetrievalChain.from_llm(
            llm=language_model,
            retriever=doc_retriever,
            memory=memory
        )
        
        conversation_log = st.session_state.conversation_log
        
        response = chat_chain({'question': formatted_query, 'chat_history': conversation_log})
        answer = response['answer']
        
        conversation_log.append((formatted_query, answer))
        st.session_state.conversation_log = conversation_log
        
        return {'response': response, 'conversation_log': conversation_log}
        
    except Exception as error:
        return {"answer": "Something went wrong. Please try again.", "sources": None}

user_input = st.text_area("Ask your question:")

if st.button("Submit"):
    if user_input:
        result = process_question(user_input, st.secrets["openai_key"])
        st.write("Response:", result['response'].get("answer", "No response found."))

Here’s what happens:

First exchange:
Q: User Question: What is this document about?
A: This document discusses a research partnership agreement between two organizations.

Second exchange:
Q: User Question: What was my previous question?
A: I don’t have access to previous questions or conversations.

The conversation history is being stored but the model doesn’t seem to use it. What am I missing here?

SoaringEagle · August 20, 2025, 6:41am

your memory isn’t getting passed to the chain properly. add verbose=True to ConversationalRetrievalChain.from_llm() so you can see what’s going on. also check if your memory object is actually getting filled - you might be storing to session_state but not loading it back when you recreate the memory object each time.

SwiftCoder42 · August 19, 2025, 12:03am

You’re creating a new ConversationBufferMemory object every time process_question runs, so it starts empty each time. Store the memory object itself in st.session_state, not just the conversation log. Also, your memory_key says “conversation_log” but ConversationalRetrievalChain expects “chat_history” by default. I hit this same issue - fixed it by initializing the memory once in session state and reusing it. Just make sure chat_history loads properly into the memory object before each query.