How to implement conversation memory in langchain for PDF chat application

I’m building a PDF chat system using langchain and need help with maintaining conversation history. My current setup has two main endpoints - one for uploading PDFs and another for processing questions. The upload part works fine where I extract text from PDFs and store it in a FAISS vector database. However, I’m struggling with the question processing part. I want the system to remember previous questions and answers so users can ask follow-up questions like “what was my previous question” or reference earlier parts of the conversation.

from flask import Flask, request, jsonify
from werkzeug.utils import secure_filename
import fitz  # PyMuPDF
from langchain_openai import OpenAIEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.vectorstores import Chroma
from langchain.chains import ConversationalRetrievalChain
from langchain_openai import ChatOpenAI
from langchain.memory import ConversationBufferMemory
import os

app = Flask(__name__)
app.config['MAX_CONTENT_LENGTH'] = 16 * 1024 * 1024  # 16MB max file size

def create_chat_chain(vectordb):
    chat_memory = ConversationBufferMemory(
        memory_key="chat_history",
        return_messages=True,
        output_key='answer'
    )
    
    llm_model = ChatOpenAI(temperature=0.3)
    retriever_config = vectordb.as_retriever(
        search_type="mmr",
        search_kwargs={"k": 3, "fetch_k": 6}
    )
    
    chat_chain = ConversationalRetrievalChain.from_llm(
        llm=llm_model,
        retriever=retriever_config,
        memory=chat_memory,
        return_source_documents=True
    )
    return chat_chain

@app.route('/document_upload', methods=['POST'])
def document_upload():
    global doc_vectorstore
    doc_vectorstore = None
    
    if 'document' not in request.files:
        return jsonify({'error': 'No file uploaded'}), 400
    
    uploaded_file = request.files['document']
    
    try:
        # Extract text using PyMuPDF
        pdf_document = fitz.open(stream=uploaded_file.read(), filetype="pdf")
        extracted_content = ""
        
        for page_num in range(pdf_document.page_count):
            current_page = pdf_document[page_num]
            extracted_content += current_page.get_text()
        
        # Split content into manageable pieces
        content_splitter = RecursiveCharacterTextSplitter(
            chunk_size=800,
            chunk_overlap=150,
            separators=["\n\n", "\n", " ", ""]
        )
        text_chunks = content_splitter.split_text(extracted_content)
        
        # Create vector database
        embedding_model = OpenAIEmbeddings()
        doc_vectorstore = Chroma.from_texts(
            texts=text_chunks,
            embedding=embedding_model
        )
        
        return jsonify({'status': 'Document processed successfully'})
        
    except Exception as error:
        return jsonify({'error': str(error)}), 500

@app.route('/ask_question', methods=['POST'])
def ask_question():
    global doc_vectorstore
    
    try:
        user_query = request.json.get('query')
        conversation_handler = create_chat_chain(doc_vectorstore)
        
        answer_response = conversation_handler({'question': user_query})
        final_answer = answer_response['answer']
        
        return jsonify({'answer': final_answer})
        
    except Exception as error:
        return jsonify({'error': str(error)}), 500

I’m particularly confused about how to properly set up the conversation memory so it maintains context between different API calls. Any guidance would be appreciated.

Your problem is session persistence - the chat chain gets recreated every request, wiping the conversation history. I hit this same issue building something similar at work. Here’s what fixed it for me: give each user a unique session ID to track their conversation state. Update your ask_question endpoint to take a session_id parameter, then store ConversationalRetrievalChain objects in a dictionary with session IDs as keys. Use active_sessions = {} at module level. First question creates and stores the chain. Later questions from the same session grab the existing chain with all the accumulated memory. You’ll want session cleanup for inactive users or you’ll get memory bloat from old sessions piling up. Also try ConversationSummaryMemory instead of ConversationBufferMemory for longer chats - helps avoid token limits.

Yeah, session management is exactly why I ditched building my own memory persistence. Been there too many times - Redis cleanup, timeouts, memory leaks, the whole mess.

I ended up automating the PDF chat flow with Latenode instead. No more Flask sessions or global dictionaries to babysit. Built a workflow where one node handles PDF upload and vectorization, another keeps conversation context with built-in state management, and a third does the LangChain stuff. Memory just persists between calls without me tracking anything.

Best part? Adding features like auto conversation summaries or per-user chat histories is just dragging new nodes around. No code changes.

I’ve watched teams burn weeks on session bugs that Latenode fixes automatically. Plus you get monitoring and scaling without dealing with infrastructure.

Check it out: https://latenode.com

Your problem is simple - you’re creating a new ConversationBufferMemory every API call, so the chat history gets wiped each time. Been there, done that on a project last year. You need to keep the memory object alive between requests. Two ways to handle this: 1. Use a global dict like user_conversations = {} and check if the user already has a chain before making a new one 2. Save the memory state to a database or Redis between requests, then reload it for new questions. I’d go with Flask sessions if you’re keeping it simple, or the database route for production. Either way, the memory has to survive beyond that single request. Don’t forget to add conversation timeouts too - you’ll get memory leaks otherwise when old sessions pile up.

you’ve nailed the issue! track sessions properly to keep the chat history alive across requests. just stash the conversation_handler objects in a dict or use redis for more complex setups. add a user_id to check if a handler exists before creating a new one.

Everyone’s right about session management, but watch out for the memory type - it’ll get you later.

Hit this exact problem 6 months ago on a document Q&A system. Started with ConversationBufferMemory like you, but token limits killed me with longer chats. ConversationSummaryBufferMemory fixed it.

Here’s what matters - when storing chat chains in your session dictionary, handle memory serialization properly. Memory objects need to be JSON serializable if you want to scale past one server.

Trick I learned: don’t store the whole ConversationalRetrievalChain object. Store chat history separately and rebuild the chain each time. Sounds backwards but it’s more reliable and uses less memory.

user_chat_histories = {}  # session_id -> chat_history list

def get_or_create_memory(session_id):
    if session_id not in user_chat_histories:
        user_chat_histories[session_id] = []
    
    memory = ConversationSummaryBufferMemory(
        memory_key="chat_history",
        return_messages=True,
        output_key='answer',
        max_token_limit=1000
    )
    
    # Restore previous conversation
    for message in user_chat_histories[session_id]:
        memory.save_context({"input": message["human"]}, {"output": message["ai"]})
    
    return memory

This video breaks down the memory implementation really well:

Set up session cleanup with TTL or you’ll have zombie conversations eating your RAM.