I created a chatbot using Streamlit that links to OpenAI through Langchain and records everything with LangSmith. The chatbot functions well, and I can see all user conversations on the LangSmith dashboard. I’ve added thumbs up and down buttons for users to rate the responses. The buttons are visible and can be clicked without issue, but when I go to check LangSmith, there seems to be no feedback data recorded. All chat runs are visible in LangSmith, but none of the user ratings are showing up. Has anyone experienced this? What might be the issue with the feedback integration?
import os
import streamlit as st
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.memory import ConversationSummaryMemory
from langchain.document_loaders import DirectoryLoader
from langchain.callbacks import collect_runs
from langsmith import Client
from streamlit_feedback import streamlit_feedback
from dotenv import load_dotenv
import uuid
load_dotenv()
@st.cache_resource
def setup_qa_system():
model = ChatOpenAI(model='gpt-3.5-turbo', temperature=0.1, openai_api_key=os.environ['OPENAI_API_KEY'])
loader = DirectoryLoader('docs/')
splitter = RecursiveCharacterTextSplitter(chunk_size=2000, chunk_overlap=150)
chunks = splitter.split_documents(loader.load())
embeddings = OpenAIEmbeddings(chunk_size=2000)
vector_db = FAISS.from_documents(documents=chunks, embedding=embeddings)
chat_memory = ConversationSummaryMemory(memory_key="history", return_messages=True)
qa_system = RetrievalQA.from_chain_type(
llm=model,
chain_type="stuff",
retriever=vector_db.as_retriever(),
memory=chat_memory
)
return qa_system
def process_user_feedback(feedback_data, emoji_icon=None):
st.toast(f"Thanks for rating: {feedback_data}", icon=emoji_icon)
return feedback_data.update({"extra_data": 456})
def save_feedback_to_langsmith():
user_feedback = st.session_state.get("user_rating")
current_run_id = st.session_state.trace_id
st.write(current_run_id)
st.write(user_feedback)
rating_values = {
"👍": 1.0,
"👌": 0.8,
"😑": 0.6,
"👎": 0.2,
"😡": 0.0,
}
rating_score = rating_values.get(user_feedback.get("score"))
if rating_score is not None:
feedback_category = f"rating {user_feedback.get('score')}"
try:
saved_feedback = langsmith_client.create_feedback(
run_id=current_run_id,
feedback_type=feedback_category,
score=rating_score
)
st.session_state.stored_feedback = {
"id": str(saved_feedback.id),
"rating": rating_score,
}
st.write(f"Rating saved successfully: {saved_feedback.id}")
except Exception as error:
st.error(f"Could not save rating: {error}")
else:
st.warning("Please select a valid rating.")
langsmith_client = Client()
qa_chain = setup_qa_system()
if "chat_history" not in st.session_state:
st.session_state["chat_history"] = []
for msg in st.session_state.chat_history:
with st.chat_message(msg["role"]):
st.markdown(msg["content"])
query = st.chat_input("Ask me anything...")
if query:
st.session_state.chat_history.append({"role": "user", "content": query})
with st.chat_message("user"):
st.markdown(query)
with collect_runs() as callback:
result = qa_chain({"query": query})
if callback.traced_runs:
trace_id = callback.traced_runs[0].id
st.session_state.trace_id = trace_id
st.write(f"Trace ID: {trace_id}")
else:
st.error("Failed to capture run")
trace_id = None
bot_response = result['result']
st.session_state.chat_history.append({"role": "assistant", "content": bot_response})
with st.chat_message("assistant"):
st.markdown(bot_response)
if bot_response is not None:
user_rating = streamlit_feedback(
feedback_type="thumbs",
key=f"rating_{trace_id}"
)
if user_rating:
st.session_state["user_rating"] = user_rating
save_feedback_to_langsmith()
Your issue is with the feedback callback and session state handling. The streamlit_feedback widget is notorious for state management problems.
I’ve hit this same wall before. You’re juggling Streamlit, LangChain, and LangSmith - their callback mechanisms don’t sync well.
Here’s what’s happening: feedback gets captured, but save_feedback_to_langsmith() runs at the wrong time compared to session state updates. Creates a race condition.
Stop wrestling with all these pieces. Automate the feedback pipeline instead. Set up a webhook that grabs Streamlit feedback events and sends them straight to LangSmith. No session state dependency.
Takes maybe 10 minutes to build. Create a webhook endpoint that receives feedback data, matches the run ID, and hits LangSmith’s API. Session state headaches gone.
I’ve automated chatbot feedback this way - way more reliable than coordinating everything through Streamlit sessions.
Hit this exact bug 6 months ago building a feedback system for our support bot. Your code’s fine - the problem is you’re processing feedback the moment the widget fires.
Streamlit widgets like streamlit_feedback trigger during reruns, but your trace_id might not exist yet or could be stale from the last conversation. Your save_feedback_to_langsmith() function runs every time the widget changes, even without valid context.
Here’s my fix: only process new feedback.
if user_rating and user_rating != st.session_state.get("last_processed_rating"):
st.session_state["user_rating"] = user_rating
st.session_state["last_processed_rating"] = user_rating
save_feedback_to_langsmith()
This stops duplicate processing and only handles fresh feedback.
Also, add a small delay before creating feedback. LangSmith sometimes needs a moment to index the run:
import time
time.sleep(0.5)
Check your LangSmith dashboard under “Feedback”, not “Runs”. Feedback appears in a separate view - you might be looking in the wrong spot.
Delay + deduplication worked perfectly for our production chatbot. Haven’t lost feedback since.
Had this exact same issue when building something similar. Your feedback collection logic is fine - the real problem is how streamlit_feedback handles state across reruns. The widget creates its own internal state that doesn’t sync up with when your save_feedback_to_langsmith() runs. Your trace_id gets captured fine, but the widget might return None or old data when the function fires.
Try using a feedback queue instead. Don’t process feedback right away - store it in a list and handle it on the next rerun when everything’s settled. Check if the feedback key exists before processing.
Also double-check your LangSmith project settings actually allow feedback. The API will accept requests but silently drop them if feedback collection is turned off at the project level.
One more thing - look at the feedback tab in LangSmith’s dashboard, not just the runs tab. Feedback appears in its own section and it’s super easy to miss if you’re only checking conversation traces.
Had this exact problem last year building feedback collection for our internal chatbot. It’s a timing issue - Streamlit reruns mess with your callback sequence.
Your save_feedback_to_langsmith() function runs before session state updates properly. The trace_id gets set, but when the feedback widget triggers, it’s in a different execution context.
Simple fix: move the feedback logic outside the main query block. Create a separate check at the top of your script that runs every rerun:
if "user_rating" in st.session_state and "trace_id" in st.session_state:
if st.session_state.user_rating and not st.session_state.get("feedback_saved"):
save_feedback_to_langsmith()
st.session_state.feedback_saved = True
This processes feedback whenever session state has both pieces ready, regardless of when the widget fires.
Also add error handling to your LangSmith client call. Sometimes the run isn’t fully indexed when you try attaching feedback.
I built a whole feedback collection system for our Streamlit apps. Here’s a walkthrough covering the gotchas:
Once you get the timing right, LangSmith feedback integration works great. Just needs that extra session state dance.
Check your LangSmith API keys and project setup first. I had the same issue - feedback was sending but to the wrong project. Your collect_runs() callback looks fine, but try adding optional=True to your create_feedback() call. LangSmith gets picky about required fields sometimes. The Streamlit feedback widget can be finicky too - use st.rerun() after capturing feedback to force a refresh. Also make sure your trace_id isn’t getting overwritten in session management.
Classic LangSmith feedback timing bug - I’ve hit this one tons of times. Your code looks fine, but there’s a timing issue with how the run ID gets linked to feedback. Here’s what’s happening: collect_runs() grabs the trace ID, but LangSmith needs a sec to fully register the run before it’ll take feedback. You’re probably calling create_feedback() too fast, before the run gets indexed. Toss in a small delay before sending feedback and check if the run exists first. I always run langsmith_client.read_run(current_run_id) before creating feedback. If it errors out, the run isn’t ready yet. Double-check your LangSmith environment variables too. If you’ve got multiple projects or environments, your feedback might be hitting the wrong project. Make sure your client initialization explicitly sets the project name. One more thing - watch out for accidentally overwriting the trace_id in session state during reruns. Print both the trace_id and user_rating right before calling create_feedback() to make sure they’re what you expect.
Been there. Your code’s fine - you’re just manually managing feedback between three systems that hate each other.
Streamlit reruns screw up callback timing. LangSmith wants feedback right after runs finish. Session state goes haywire during widget clicks.
I wasted hours debugging this exact problem. Then I stopped overcomplicating it.
Build simple automation for the feedback flow. Make a webhook that grabs Streamlit feedback and pushes it to LangSmith with proper run mapping. No session state mess.
Webhook receives feedback, checks if the run ID exists in LangSmith, creates the feedback entry. Takes 15 minutes and kills all timing problems.
I’ve done this for multiple chatbot projects. Always works because you’re not fighting Streamlit’s rerun cycle.
Automation handles retries when LangSmith isn’t ready, queues feedback for API limits, and logs errors properly.