I’m building a document retrieval system with LangChain that uses MultiQueryRetriever to fetch documents and generate responses through Ollama. My current setup works fine but I need to debug what’s happening inside the chain.
My current code:
import logging
logging.basicConfig()
logging.getLogger("langchain.retrievers.multi_query").setLevel(logging.INFO)
# Template for creating multiple search queries
SEARCH_PROMPT = PromptTemplate(
input_variables=["user_query"],
template="""Act as a helpful AI assistant. Create five different
variations of the provided user query to search for relevant documents in
a vector store. These variations should help overcome limitations of
similarity-based search by providing different perspectives.
Separate each variation with a newline.
User query: {user_query}""",
)
# Setup MultiQueryRetriever
doc_retriever = MultiQueryRetriever.from_llm(
vector_store.as_retriever(),
llm_model,
prompt=SEARCH_PROMPT
)
# Final answer generation template
final_template = """Use ONLY the provided context to answer the question:
Context: {context}
User Question: {user_query}"""
# Build the complete chain
final_prompt = ChatPromptTemplate.from_template(final_template)
from langchain_core.runnables import RunnableLambda
def debug_state(data):
"""Helper function to examine chain state"""
print(data)
return data
response_chain = (
{"context": doc_retriever, "user_query": RunnablePassthrough()}
| RunnableLambda(debug_state)
| final_prompt
| llm_model
| StrOutputParser()
)
# Test the chain
response_chain.invoke("Show me 5 quotes about friendship from these documents?")
My question: I want to see the exact final prompt that gets sent to the LLM after the context is retrieved and formatted. How can I capture and display this complete prompt before it reaches the Ollama model?
You’re on the right track, but there’s a simpler way. Skip debugging at the chain level and just override the _generate method in your LLM instance. I’ve done this tons of times with retrieval systems that needed prompt inspection.
Create a wrapper around your llm_model that grabs the prompt before it runs:
class DebugLLM:
def __init__(self, llm):
self.llm = llm
def invoke(self, input_data, config=None):
print("\n=== COMPLETE PROMPT TO LLM ===")
if hasattr(input_data, 'messages'):
for msg in input_data.messages:
print(f"{msg.type}: {msg.content}")
else:
print(input_data)
print("=== END PROMPT ===\n")
return self.llm.invoke(input_data, config)
# Replace your llm_model with:
debug_llm = DebugLLM(llm_model)
Use debug_llm in your chain instead. This catches everything that actually gets sent to Ollama, including whatever internal formatting LangChain does. Way more reliable than trying to intercept at the prompt template level since LangChain sometimes does extra processing between template and execution.
Add a debug step between your prompt template and LLM by changing your chain structure. Your current debug_state function shows the input data before prompt formatting, which isn’t what you want.
This catches the formatted prompt after template processing but before Ollama runs. I’ve used this tons of times with RAG systems where you need to see exactly how context gets injected for performance tuning. The formatted prompt has the complete text your LLM actually processes, with all the retrieved document context properly inserted into your template.
Add another debug function right before the LLM step. Your current debug_state function shows the input to the prompt template, but not the final formatted prompt.
Try this:
def debug_final_prompt(prompt):
"""Shows the complete formatted prompt"""
print("=== FINAL PROMPT SENT TO LLM ===")
print(prompt.to_string())
print("=== END PROMPT ===")
return prompt
response_chain = (
{"context": doc_retriever, "user_query": RunnablePassthrough()}
| final_prompt
| RunnableLambda(debug_final_prompt)
| llm_model
| StrOutputParser()
)
This captures the prompt object after formatting but before it hits Ollama.
Honestly, debugging complex LangChain setups gets messy fast. I’ve been there with similar retrieval systems and the logging becomes a nightmare when you scale up.
I ended up moving this whole workflow to Latenode. You can build the same multi query retrieval flow with visual nodes, and each step shows you exactly what data flows through. No more guessing what the prompt looks like or digging through logs.
You can connect your vector store, set up the query variations, and see the complete prompt at every step. Way cleaner than scattered debug functions.
Quick fix - use LangSmith’s RunCollectorCallbackHandler to catch everything without messing with your chain:
from langchain.callbacks import RunCollectorCallbackHandler
callback = RunCollectorCallbackHandler()
response = response_chain.invoke(
"Show me 5 quotes about friendship from these documents?",
config={"callbacks": [callback]}
)
# Print all runs to see prompts
for run in callback.traced_runs:
if run.run_type == "llm":
print("=== LLM INPUT ===")
print(run.inputs)
print("=== END ===")
This grabs the actual input to your LLM without any wrapping. I use this when I need to see what MultiQueryRetriever’s doing internally too.
Hit the same problem last month with a similar doc setup. The callback shows you the query variations AND the final prompt that goes to Ollama.
Or just set LANGCHAIN_TRACING_V2=true in your environment - everything gets logged automatically. Less code but way more noise.
Add enable_tracing=True to your MultiQueryRetriever and use langsmith callbacks. You’ll see everything, including prompt formatting. I’ve used this on similar projects - works great for debugging complex chains without adding debug functions everywhere.