Restricting chat conversation history size in Langchain

I want to control the number of messages stored in my chat history to prevent overwhelming the language model with too much context data.

I’m trying to implement a solution using RunnableWithMessageHistory combined with a custom filtering mechanism. However, I’m running into several issues that I can’t figure out.

The main problems I’m facing are:

  1. My message limiting function doesn’t seem to be working properly.
  2. I can’t get the current user input to be processed correctly by the model.
  3. The filtering logic isn’t being applied to the conversation history as expected.

Specific questions I have:

  • Why isn’t my RunnablePassthrough working with the message history?
  • Is my approach with HumanMessage("{input_key}") and input_messages_key correct?
  • Should I implement the trim_messages function inside get_session_history instead?
from typing import List, Union
from langchain_openai import ChatOpenAI
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.messages import HumanMessage, AIMessage, SystemMessage
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_core.runnables import RunnablePassthrough

memory_store = {}
llm_model = "gpt-3.5-turbo"
system_prompt = "You are a helpful assistant"
llm = ChatOpenAI(model=llm_model)

def get_chat_history(chat_id: str) -> BaseChatMessageHistory:
    if chat_id not in memory_store:
        memory_store[chat_id] = ChatMessageHistory()
    chat_memory: BaseChatMessageHistory = memory_store[chat_id]
    return chat_memory

def trim_messages(msg_list: List[Union[HumanMessage, AIMessage]]) -> List[Union[HumanMessage, AIMessage]]:
    return msg_list[-3:]

def handle_conversation(user_text: str, chat_memory: BaseChatMessageHistory, chat_id: str) -> AIMessage:
    template = ChatPromptTemplate.from_messages(
        [
            SystemMessage(system_prompt),
            MessagesPlaceholder(variable_name="conversation_history"),
            HumanMessage("{input_key}")
        ]
    )

    chain = RunnablePassthrough.assign(conversation_history=lambda x: trim_messages(x["conversation_history"])) | template | llm

    chat_with_memory = RunnableWithMessageHistory(
        runnable=chain,
        get_session_history=get_chat_history,
        history_messages_key="conversation_history",
        input_messages_key="user_input",
    )

    result: AIMessage = chat_with_memory.invoke(
        {"user_input": [HumanMessage(content=user_text)]},
        config={"configurable": {"session_id": chat_id}},
    )
    print(result.content)
    return result

chat_session = "demo_chat"
test_messages = [
    HumanMessage(content="Blue is my preferred color"),
    AIMessage(content="Interesting choice!"),
    HumanMessage(content="Test message one"),
    HumanMessage(content="Test message two"),
    HumanMessage(content="Test message three"),
    HumanMessage(content="Test message four"),
    HumanMessage(content="My name is Alice"),
    AIMessage(content="Nice to meet you Alice"),
    HumanMessage(content="This weather is terrible"),
]
chat_memory = get_chat_history(chat_session)
for msg in test_messages:
    chat_memory.add_message(msg)

user_query = "Say exactly: FAREWELL"
response = handle_conversation(user_query, chat_memory, chat_session)

Had the exact same issue for weeks! The problem is RunnableWithMessageHistory expects your input_messages_key to match what you’re actually passing in, but you’re double-handling the user input - once in your template and again in the invoke call. Just pick one approach. I’d drop the manual message handling from your template entirely. Keep only the system message and history placeholder, then invoke with a plain string. Make sure your input_messages_key matches whatever key you use when invoking. For trimming, skip the RunnablePassthrough mess. Create a custom message history class that overrides the messages property to return the last N messages. Way cleaner - trimming happens automatically whenever history gets accessed instead of trying to intercept it mid-chain.

The problem is you’re trying to trim messages that don’t exist yet. Your RunnablePassthrough.assign looks for conversation_history before RunnableWithMessageHistory has loaded it.

I’ve hit this exact issue before. The fix is moving your trimming logic into get_chat_history. Here’s how:

def get_chat_history(chat_id: str) -> BaseChatMessageHistory:
    if chat_id not in memory_store:
        memory_store[chat_id] = ChatMessageHistory()
    
    chat_memory = memory_store[chat_id]
    chat_memory.messages = chat_memory.messages[-6:]
    return chat_memory

This trims when the history gets retrieved. You can drop the complex chain setup and just use template | llm.

your template’s clashing with the history wrapper. drop the HumanMessage("{input_key}") from your template - RunnableWithMessageHistory already handles that. also switch input_messages_key to input_key and pass the string directly when you invoke it, don’t wrap it in a HumanMessage list.

Hit this same problem last year building a customer service bot. Your chain config doesn’t match what RunnableWithMessageHistory expects.

You’ve got input_messages_key="user_input" but your template has HumanMessage("{input_key}") - they’re looking for different keys. Pick one and stick with it.

This actually works:

def handle_conversation(user_text: str, chat_memory: BaseChatMessageHistory, chat_id: str) -> AIMessage:
    template = ChatPromptTemplate.from_messages([
        SystemMessage(system_prompt),
        MessagesPlaceholder(variable_name="conversation_history"),
    ])

    chain = template | llm

    chat_with_memory = RunnableWithMessageHistory(
        runnable=chain,
        get_session_history=get_chat_history,
        history_messages_key="conversation_history",
    )

    result = chat_with_memory.invoke(
        user_text,  # Just pass the string directly
        config={"configurable": {"session_id": chat_id}},
    )
    return result

For message limiting, I do it with a custom history class:

class TrimmedChatHistory(ChatMessageHistory):
    def __init__(self, max_messages=6):
        super().__init__()
        self.max_messages = max_messages
    
    @property
    def messages(self):
        return self._messages[-self.max_messages:]

Just use TrimmedChatHistory() instead of ChatMessageHistory() in your store. Way cleaner than messing with the chain.

This complexity is exactly why I ditched Langchain’s message history headaches. You’re wrestling with the framework instead of fixing your real problem.

Think about what you actually need: limit chat history and chain some API calls. That’s an automation workflow, not a coding puzzle.

I built something similar for our support bot. Instead of debugging Langchain’s message mess, I used Latenode for a simple workflow:

  • Database node stores messages
  • Auto-trims to last N messages on retrieval
  • Sends clean data to OpenAI
  • Manages sessions automatically

Took 15 minutes to build visually. No more key mismatches or wondering where trimming logic goes. The workflow limits messages automatically, and I can see what’s happening at every step.

When you need user analytics or other integrations later, just drag and drop new nodes instead of rewriting chain logic.

Why fight framework limits when you can automate the whole conversation flow?