How to maintain conversation history with OpenAI API and Langchain for PDF-based chatbot in Python?

I’m pretty new to coding and working on my first big project using OpenAI’s API with Langchain.

I’m building a chatbot that can read PDF files and answer questions about them. The basic functionality works fine, but I’m stuck on one major issue. I want the bot to remember previous messages in our conversation.

For example, if I ask “What is a butterfly?” and then follow up with “What size are they?”, the bot should know I’m still talking about butterflies. Right now it acts like each question is completely separate.

I’ve been trying to use ConversationBufferMemory but something isn’t working right. The bot gives weird responses when I ask follow-up questions that need context from earlier in the chat.

import os
import gradio as gr
from langchain.chat_models import ChatOpenAI
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.text_splitter import TokenTextSplitter
from langchain.document_loaders import PyPDFLoader
from langchain.prompts.prompt import PromptTemplate
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory

os.environ["OPENAI_API_KEY"] = 'your-key-here'

def load_pdf_data():
    pdf_loader = PyPDFLoader('files/insects.pdf')
    documents = pdf_loader.load()
    return documents

pdf_data = load_pdf_data()

splitter = TokenTextSplitter(chunk_size=800, chunk_overlap=100)
split_docs = splitter.split_documents(pdf_data)

embedding_model = OpenAIEmbeddings()
vector_store = Chroma.from_documents(split_docs, embedding_model)
doc_retriever = vector_store.as_retriever(search_type="similarity")

query_template = """{question}"""
PROMPT_TEMPLATE = PromptTemplate(template=query_template, input_variables=["question"])

conversation_memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

def create_response(user_query, chat_log):
    if user_query:
        language_model = ChatOpenAI(temperature=0.3, model_name="gpt-3.5-turbo")
        qa_chain = ConversationalRetrievalChain.from_llm(language_model, doc_retriever, PROMPT_TEMPLATE, verbose=True, memory=conversation_memory)
        response = qa_chain({"question": user_query, "chat_history": chat_log})
    return response["answer"]

def chat_interface(user_input, conversation_history):
    conversation_history = conversation_history or []
    flattened_history = list(sum(conversation_history, ()))
    flattened_history.append(user_input)
    combined_input = ' '.join(flattened_history)
    bot_response = create_response(user_input, conversation_history)
    conversation_history.append((user_input, bot_response))
    return conversation_history, conversation_history

with gr.Blocks() as app:
    gr.Markdown("""<h1><center>PDF Question Bot</center></h1>""")
    chat_display = gr.Chatbot()
    session_state = gr.State()
    input_field = gr.Textbox(placeholder="Ask me about the document...")
    send_button = gr.Button("Send Message")
    send_button.click(chat_interface, inputs=[input_field, session_state], outputs=[chat_display, session_state])

app.launch(share=True)

I’ve tried different approaches but keep breaking something else. Any ideas what I’m doing wrong with the memory setup?

You’re creating a new qa_chain instance every time chat_interface runs - that’s why your ConversationBufferMemory keeps resetting. Move the chain creation outside the function so it only initializes once after you set up your retriever and memory. Also, you’ve got the chain’s internal memory fighting with the manual chat_history you’re passing in. Pick one approach - either let the ConversationalRetrievalChain handle memory on its own, or do it manually. I’d just use the chain’s built-in memory and ditch the manual chat_history parameter completely.

I hit the same problem with retrieval chatbots. You’re recreating the ChatOpenAI model and ConversationalRetrievalChain every single time create_response runs, which wipes out the memory. Move everything except the actual query call outside that function. Set up your language_model and qa_chain at module level, right after your vector store and memory setup. Then create_response just becomes a simple wrapper that passes the query to qa_chain. Drop the custom prompt template from ConversationalRetrievalChain.from_llm - it’s got its own conversation handling built in. Let it use the default. Make sure your ConversationBufferMemory has output_key=“answer” configured. That’s what ConversationalRetrievalChain returns, so without it your bot’s responses won’t get saved to memory.

the problem’s in your create_response function - you’re not using the conversation_memory you set up. just call qa_chain({"question": user_query}) without manually passing chat_history. and move the chain creation outside the function so it sticks around between calls, otherwise you’re wiping the memory every time.

This is way simpler than you’re making it. You’re overengineering the memory stuff when you could just automate the whole flow.

I hit this exact issue last year building a document chatbot for our knowledge base. Wasted weeks fighting LangChain’s memory configs and manual chat history. Total waste of time.

You need an automation platform that handles conversation state automatically. I switched to Latenode and built the same bot in hours instead of weeks. It connects to OpenAI, processes PDFs, and keeps conversation context without the manual memory hassle.

Latenode’s visual workflow handles PDF processing, embedding storage, and conversation chains with built-in persistent memory. No more recreating chains every function call or wrestling with ConversationBufferMemory.

Best part? You can test the flow step by step and see exactly where context breaks. Way easier than debugging Python with gradio.

Your current setup has too many breaking points. Automate it properly and focus on actual chatbot logic instead of memory headaches.

Check it out: https://latenode.com

You’ve got duplicate memory systems fighting each other - Gradio’s state management and LangChain’s internal memory don’t play nice together. Ditch the manual chat_history tracking in Gradio completely and let ConversationBufferMemory do its job. Set up your qa_chain once outside create_response, then just call it with the question. Your chat_interface should only pass user input to create_response and update the display. ConversationalRetrievalChain handles context automatically between questions. I built a similar legal doc chatbot last month and this fix solved everything.

Your prompt template’s too basic - it’s not using chat history properly. You’ve only got {question} but ConversationalRetrievalChain needs both the question and previous conversation context. Try this instead: Given the following conversation and a follow-up question, provide a comprehensive answer based on the document context. Chat History: {chat_history} Follow-up Question: {question}. Also, your chat_interface function is doing way too much work flattening and combining history. ConversationalRetrievalChain handles this stuff internally, so ditch those manual history manipulations and let the chain’s memory system do what it’s designed for. And that 0.3 temperature might be too low for conversations - bump it to 0.7 for more natural responses.

you’re passing chat_history twice, which confuses the chain. ConversationalRetrievalChain handles memory on its own - you don’t need to manually pass chat_history in your query. just remove the chat_history parameter from your qa_chain call and let ConversationBufferMemory handle it automatically.