Combining conversation history with source citations in LangChain RAG implementation

Luke_Brilliant · August 6, 2025, 12:17pm

I’m working with a LangChain RAG system in Python and facing an issue. I can get either conversation memory working OR source citations, but not both together. Has anyone figured out how to combine these features?

Conversation History Setup

For keeping track of previous messages, I use this approach:

assistant_instructions = (
    "You help users by answering questions. "
    "Reference the provided context below to respond. "
    "If unsure about something, admit you don't know. "
    "Keep responses brief, maximum three sentences."
    "\n\n"
    "{context}"
)

rewrite_question_instructions = (
    "Looking at the conversation history and current user question, "
    "create a standalone version that doesn't need the chat history "
    "to understand. Don't provide an answer, just rewrite the question "
    "if necessary or keep it unchanged."
)

rewrite_template = ChatPromptTemplate.from_messages(
    [
        ("system", rewrite_question_instructions),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)
context_aware_search = create_history_aware_retriever(
    self.model, self.document_retriever, rewrite_template
)

response_template = ChatPromptTemplate.from_messages(
    [
        ("system", assistant_instructions),
        MessagesPlaceholder("chat_history"),
        ("human", "{input}"),
    ]
)

answer_generator = create_stuff_documents_chain(self.model, response_template)
full_pipeline = create_retrieval_chain(context_aware_search, answer_generator)

Source Citation Method

To show document sources, I use this different setup:

CITATION_TEMPLATE = """
You assist with answering questions using provided context. Reference only the given information to respond. If you lack knowledge on the topic, state this clearly. Limit responses to three sentences maximum.
Include the page references you consulted at the end of your response.

<context>
{context}
</context>

Question to address:

{query}"""

citation_template = ChatPromptTemplate.from_template(CITATION_TEMPLATE)

def prepare_documents(documents):
    return "\n\n".join(f"Page {doc.metadata['page_number'] + 1}:\n{doc.page_content}" for doc in documents)

processing_chain = (
    RunnablePassthrough.assign(context=lambda data: prepare_documents(data["context"]))
    | citation_template
    | self.model
    | StrOutputParser()
)

relevant_docs = self.vector_db.similarity_search(user_question)
final_answer = processing_chain.invoke({"context": relevant_docs, "query": user_question})

I’ve tried merging these approaches but keep running into issues. Maybe I need a custom retriever or modify the existing history-aware one?

jackw · August 16, 2025, 1:04pm

Hit this exact problem three months ago building our document Q&A system. LangChain’s chain composition kills document metadata flow.

What actually worked - override the stuff documents chain completely:

def custom_documents_chain(llm, prompt):
    def format_docs_with_sources(inputs):
        docs = inputs["context"]
        chat_history = inputs.get("chat_history", [])
        user_input = inputs["input"]
        
        # Format docs with preserved citations
        doc_strings = []
        for i, doc in enumerate(docs):
            source_info = f"Document {i+1} (Page {doc.metadata.get('page_number', 0) + 1})"
            doc_strings.append(f"{source_info}: {doc.page_content}")
        
        formatted_context = "\n\n".join(doc_strings)
        
        return {
            "context": formatted_context,
            "chat_history": chat_history,
            "input": user_input
        }
    
    return RunnablePassthrough.assign(context=format_docs_with_sources) | prompt | llm

# Update your system prompt
updated_instructions = (
    "Answer questions using the provided context. "
    "Always cite sources as Document X when referencing information. "
    "Keep responses under three sentences.\n\n{context}"
)

response_template = ChatPromptTemplate.from_messages([
    ("system", updated_instructions),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
])

# Build the complete chain
answer_generator = custom_documents_chain(self.model, response_template)
full_pipeline = create_retrieval_chain(context_aware_search, answer_generator)

This keeps your existing history-aware retrieval while making sure document metadata survives the entire chain. You need to control document formatting at the chain level, not just retrieval.

Tried all the other approaches but they felt hacky. This works with LangChain’s architecture instead of fighting it.

WhisperingWind · August 16, 2025, 9:03am

Hit this exact problem building our internal knowledge base. LangChain’s retrieval chains convert documents to strings too early, which kills the metadata you need for citations.

Here’s what fixed it - I built a custom RunnableLambda that grabs documents after history-aware retrieval but before the final prompt:

def inject_citations(chain_input):
    # Get the retrieved docs (still has metadata at this point)
    docs = chain_input["context"]
    
    # Build citation-aware context
    cited_context = []
    for idx, doc in enumerate(docs, 1):
        page = doc.metadata.get('page_number', 0) + 1
        cited_context.append(f"[Ref {idx}, p.{page}] {doc.page_content}")
    
    chain_input["context"] = "\n\n".join(cited_context)
    return chain_input

# Insert between retrieval and generation
citation_processor = RunnableLambda(inject_citations)

# Modified pipeline
full_pipeline = {
    "context": context_aware_search,
    "input": RunnablePassthrough(),
    "chat_history": RunnablePassthrough()
} | citation_processor | answer_generator

This keeps your history-aware retrieval intact while making sure document metadata actually makes it to response generation. The trick is catching the data right after retrieval but before LangChain formats everything for the generation chain.

danwilson85 · August 16, 2025, 4:25am

had the same nightmare recently. You need to modify the document chain itself, not just retrieval. Override create_stuff_documents_chain with a custom version that keeps metadata throughout the pipeline. Create a custom document formatter and inject it into the chain before the LLM call. much cleaner than intercepting docs after retrieval.

josephk · August 16, 2025, 12:10am

I hit this exact problem in my production RAG system last year. LangChain’s retrieval chains don’t preserve document metadata properly when you add conversation history. I ended up bypassing the standard chains completely. Here’s what works: manually retrieve documents using the history-aware retriever, then inject source info directly into the context string before hitting the final chat chain:

# Get context-aware documents
retrieved_docs = context_aware_search.invoke({
    "input": user_question, 
    "chat_history": chat_history
})

# Build enriched context with source tracking
context_with_sources = []
for i, doc in enumerate(retrieved_docs):
    source_id = f"Source_{i+1}"
    context_with_sources.append(f"{source_id}: {doc.page_content}")

enriched_context = "\n\n".join(context_with_sources)

# Single chain for final response
final_prompt = ChatPromptTemplate.from_messages([
    ("system", "Use the context below to answer. Cite sources as Source_1, Source_2, etc.\n\n{context}"),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])

final_chain = final_prompt | self.model | StrOutputParser()
response = final_chain.invoke({
    "context": enriched_context,
    "chat_history": chat_history,
    "input": user_question
})

You get conversation awareness from retrieval plus reliable source citations in responses. The trick is separating retrieval logic from response generation completely.

nateharris · August 15, 2025, 8:03pm

Been wrestling with this exact problem for months on our enterprise RAG platform. The issue is LangChain’s retrieval chains lose the original document objects when they get processed through the history pipeline.

Here’s how I solved it - intercept the documents before they get transformed:

class CitationAwareRetriever:
    def __init__(self, base_retriever):
        self.base_retriever = base_retriever
        
    def retrieve_with_metadata(self, query_data):
        # Get docs with full metadata intact
        docs = self.base_retriever.invoke(query_data)
        
        # Store metadata separately before it gets lost
        self.last_sources = [
            {"page": doc.metadata.get('page_number', 0) + 1, 
             "content": doc.page_content}
            for doc in docs
        ]
        
        return docs
    
    def format_context_with_citations(self, docs):
        formatted = []
        for i, doc in enumerate(docs):
            page_num = doc.metadata.get('page_number', 0) + 1
            formatted.append(f"[Source {i+1}, Page {page_num}]: {doc.page_content}")
        return "\n\n".join(formatted)

# Wire it up
citation_retriever = CitationAwareRetriever(context_aware_search)

# Your prompt needs to tell the model to use the citation format
response_prompt = ChatPromptTemplate.from_messages([
    ("system", "Answer using context below. Always reference sources as [Source X, Page Y].\n\n{context}"),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])

# Build final chain
final_chain = (
    RunnablePassthrough.assign(
        context=lambda x: citation_retriever.format_context_with_citations(
            citation_retriever.retrieve_with_metadata(x)
        )
    )
    | response_prompt
    | self.model
    | StrOutputParser()
)

This keeps conversation memory working through the standard retrieval chain but captures source info before it disappears.

Took me way too long to figure this out because LangChain docs barely mention this metadata loss issue.

deltaDreamer · August 14, 2025, 7:26pm

Hit this same issue a few months ago building a customer support RAG system. LangChain’s built-in chains just don’t work well together when you need both features.

What actually works: handle retrieval and context formatting yourself, then pass everything to one chain. Here’s my solution:

def build_context_with_citations(documents):
    context_parts = []
    for doc in documents:
        page_ref = f"[Page {doc.metadata['page_number'] + 1}]"
        context_parts.append(f"{page_ref} {doc.page_content}")
    return "\n\n".join(context_parts)

# Use history-aware retriever for document selection
relevant_docs = context_aware_search.invoke({
    "input": user_question,
    "chat_history": conversation_history
})

# Format context with citations
formatted_context = build_context_with_citations(relevant_docs)

# Single chain with both history and citations
final_template = ChatPromptTemplate.from_messages([
    ("system", "Answer using the context below. Include page references in your response.\n\n{context}"),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}")
])

chain = final_template | self.model | StrOutputParser()
response = chain.invoke({
    "context": formatted_context,
    "chat_history": conversation_history,
    "input": user_question
})

You get conversation continuity plus proper source tracking. Let the history-aware retriever handle context-aware document selection - that’s the key.

Honestly, managing all this LangChain complexity gets old fast. I moved most RAG workflows to Latenode since it handles orchestration between different AI services way cleaner. You can set up conversation memory, document retrieval, and citation formatting as separate nodes that actually work together.

SilentSailing34 · August 14, 2025, 12:13pm

Hit this exact problem building our research document system. LangChain’s create_stuff_documents_chain wants plain strings, but you need Document objects to keep the source metadata.

I ditched the standard stuff chain completely. Instead of create_stuff_documents_chain, I made a RunnableSequence that handles conversation context and document formatting:

def create_citation_chain(llm, prompt_template):
    def process_inputs(inputs):
        docs = inputs["context"]
        formatted_docs = []
        
        for doc in docs:
            page_num = doc.metadata.get('page_number', 0) + 1
            source_text = f"(Source: Page {page_num}) {doc.page_content}"
            formatted_docs.append(source_text)
        
        return {
            "context": "\n\n".join(formatted_docs),
            "input": inputs["input"],
            "chat_history": inputs.get("chat_history", [])
        }
    
    return RunnableLambda(process_inputs) | prompt_template | llm

# Replace the standard document chain
answer_generator = create_citation_chain(self.model, response_template)
full_pipeline = create_retrieval_chain(context_aware_search, answer_generator)

You can still use create_retrieval_chain for the outer structure while controlling how documents get formatted for citations. Chat history flows through fine since you’re only replacing the document processing bit.

sofia_scribbles · August 12, 2025, 4:05pm

I’ve hit this exact problem multiple times at work. LangChain overcomplicates simple stuff.

You’re fighting LangChain’s chain architecture. Custom retrievers and document processors work but they’re a pain to maintain. Learned that lesson the hard way.

I skip the complex chain stuff now. Break the RAG workflow into separate steps:

History-aware document retrieval
Citation formatting
Context injection
LLM response generation
Response post-processing

Each step does one thing well instead of cramming everything into retrieval chains. Better error handling, easier debugging, and you can swap components without breaking stuff.

I use Latenode for this. Build conversation memory as one node, document retrieval as another, citation formatting as a third. Data passes cleanly between them without losing metadata.

Best part? You can test each piece separately and actually see what breaks. No more mystery chain failures.

Saved me weeks compared to wrestling with custom LangChain implementations.