Where to integrate conversation history in RAG system with langchain template

Alex_Thunder · August 11, 2025, 12:12pm

I built a Python application that saves various documents like PDFs and web URLs into a langchain ChromaDB database. After storing the data, I can query it and get answers from my local AI model.

I successfully implemented conversation history when working with ChatPromptTemplate.from_messages, but I’m stuck on how to include chat history when using ChatPromptTemplate.from_template instead.

Here’s my current setup:

system_template = """You are a helpful and trustworthy assistant. Answer user queries \
using only the provided context. If the information isn't available, \
simply state that you don't have that information. Avoid creating answers:
{context}
User Query: {query}
"""
system_prompt = ChatPromptTemplate.from_template(system_template)
processing_chain = (
{"context": document_retriever, "query": RunnablePassthrough()}
| system_prompt
| local_model
| StrOutputParser()
)
result = processing_chain.invoke(user_input)

Can anyone guide me on the proper way to incorporate chat history into this template-based approach?

evelynh · August 17, 2025, 2:51pm

I hit this exact problem last year building a document Q&A system for our internal knowledge base.

You need to modify your template to include a history placeholder and adjust your chain inputs. Here’s what worked:

system_template = """You are a helpful and trustworthy assistant. Use the conversation history and context to answer queries.

Conversation History:
{history}

Context:
{context}

User Query: {query}"""

system_prompt = ChatPromptTemplate.from_template(system_template)

# Store your chat history as a list of message pairs
chat_history = []

processing_chain = (
    {"context": document_retriever, 
     "query": RunnablePassthrough(),
     "history": lambda x: "\n".join([f"Human: {h['human']}\nAssistant: {h['assistant']}" for h in chat_history])}
    | system_prompt
    | local_model
    | StrOutputParser()
)

result = processing_chain.invoke(user_input)

# After getting the result, update your history
chat_history.append({"human": user_input, "assistant": result})

The main difference from from_messages is you’re formatting history as a string instead of message objects. I keep the last 5-6 exchanges to avoid token limits.

This gave me way better context awareness in follow-up questions.