Replacing ChatMessageHistory with ConversationBufferWindowMemory in Python Langchain

Sophia63 · August 22, 2025, 2:17pm

I’ve been working on a RAG system with chat history functionality using Langchain. My current setup stores all previous messages, but I want to limit it to only keep the most recent messages (like the last 5 or 10). I’m trying to figure out how to modify my existing code.

Here’s my current implementation:

message_store = {}

def retrieve_chat_history(user_session: str) -> BaseChatMessageHistory:
    if user_session not in message_store:
        message_store[user_session] = ChatMessageHistory()
    return message_store[user_session]

rag_with_history = RunnableWithMessageHistory(
    my_rag_chain,
    retrieve_chat_history,
    input_messages_key="query",
    history_messages_key="previous_messages",
    output_messages_key="response",
)

I know there’s ConversationBufferWindowMemory that should handle message limiting, but I can’t figure out how to integrate it properly. My attempt resulted in type errors because I was mixing runnables with chains incorrectly.

Is there a way to modify the session history function to use windowed memory instead? Or should I take a completely different approach for limiting chat history length?

SkippingLeaf · August 30, 2025, 11:42am

Hit this same issue last month and found a different solution that worked great. Skip modifying ChatMessageHistory entirely - just add the windowing logic to your RAG chain preprocessing instead. Before messages hit the chain, I throw in a simple filter that slices the message history. Here’s the trick: RunnableWithMessageHistory lets you access the full history object, so you can mess with it when you retrieve it. I wrapped my original retrieve function to do history = retrieve_chat_history(session_id) then immediately run history.messages = history.messages[-window_size:] if needed. You keep ChatMessageHistory working normally but add windowing exactly where you want it. Plugs right into the existing runnable interface without any subclassing headaches.

sofia_scribbles · August 30, 2025, 8:01am

Had this exact problem a few months ago building a customer support chatbot. The type mixing between runnables and memory objects is annoying.

You can keep your current structure and just add a wrapper that limits messages. Instead of returning the full ChatMessageHistory, slice it to keep only recent messages:

def retrieve_chat_history(user_session: str) -> BaseChatMessageHistory:
    if user_session not in message_store:
        message_store[user_session] = ChatMessageHistory()
    
    full_history = message_store[user_session]
    if len(full_history.messages) > 10:  # Keep last 10 messages
        recent_messages = full_history.messages[-10:]
        limited_history = ChatMessageHistory()
        for msg in recent_messages:
            limited_history.add_message(msg)
        return limited_history
    
    return full_history

But honestly, managing chat history and memory limits manually gets messy fast. You’ll end up writing tons of boilerplate for session management, message pruning, and state persistence.

I moved this logic to Latenode after dealing with memory leaks and session cleanup headaches. You can set up the whole conversation flow with automatic message limiting as a visual workflow. No more wrestling with Langchain memory types or writing custom session handlers.

The automation handles everything from message storage to cleanup, and you can adjust the window size without touching code. Way cleaner than doing it manually.

ethant · August 28, 2025, 8:51pm

you dont really need to switch to ConversationBufferWindowMemory. just tweak your retrieve_chat_history function to trim messages from the ChatMessageHistory object directly. add if len(history.messages) > 10: history.messages = history.messages[-10:] before returning it. way simpler than messin with memory classes.

John_Fast · August 28, 2025, 4:58pm

I encountered a similar issue recently. The problem arises from the fact that ConversationBufferWindowMemory is designed for an older chain-based model, while you are implementing a Runnable interface. A straightforward solution is to create a custom wrapper class that extends BaseChatMessageHistory and manages the window size on its own. You can subclass ChatMessageHistory and override the add_message method to ensure the window size is maintained. Here’s a sample implementation:

class WindowedChatMessageHistory(ChatMessageHistory):
    def __init__(self, window_size=10):
        super().__init__()
        self.window_size = window_size
    
    def add_message(self, message):
        super().add_message(message)
        if len(self.messages) > self.window_size:
            self.messages = self.messages[-self.window_size:]

Just replace ChatMessageHistory() with WindowedChatMessageHistory(10) in your retrieve_chat_history function. This keeps everything compatible with RunnableWithMessageHistory while enforcing the message limit you need.

jackw · August 28, 2025, 3:47pm

Been dealing with this exact scenario for months in our production RAG systems. ConversationBufferWindowMemory is honestly overkill here and creates more problems than it solves when you’re using the newer runnable interface.

What I’ve found works best is handling the windowing at the storage level rather than the retrieval level. Instead of trimming messages every time you fetch history, do it when you store them:

def retrieve_chat_history(user_session: str) -> BaseChatMessageHistory:
    if user_session not in message_store:
        message_store[user_session] = ChatMessageHistory()
    
    history = message_store[user_session]
    # Keep only last 10 messages
    if len(history.messages) > 10:
        history.messages = history.messages[-10:]
    
    return history

This keeps your existing RunnableWithMessageHistory setup intact while giving you the window behavior you want. No custom classes, no type conflicts, just clean message limiting.

If you want to understand the theory behind conversation buffer windows and how they work under the hood, this video breaks it down really well:

One thing to watch out for - make sure you’re not accidentally cutting off mid conversation pairs. I usually check if the message count is odd and keep one extra message to maintain context flow.