Streamlit chatbot feedback not showing up in LangSmith tracking

I built a Streamlit application that has a chatbot powered by Langchain, OpenAI, and LangSmith. The bot works fine and I can see all the conversation runs in the LangSmith dashboard. However, I added a feedback system with thumbs up/down buttons after each bot response, but these feedback scores never show up in LangSmith even though the button clicks work properly. The runs are being tracked correctly; just the user feedback part is not getting recorded. What am I doing wrong?

import os
import streamlit as st
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.memory import ConversationBufferWindowMemory
from langchain.document_loaders import DirectoryLoader
from langchain.callbacks import collect_runs
from langsmith import Client
from streamlit_feedback import streamlit_feedback
from dotenv import load_dotenv
import uuid

load_dotenv()

@st.cache_resource
def build_qa_system():
    model = ChatOpenAI(model='gpt-3.5-turbo', temperature=0.1, openai_api_key=os.environ['OPENAI_API_KEY'])
    
    loader = DirectoryLoader('docs/')
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    chunks = splitter.split_documents(loader.load())
    
    vector_db = FAISS.from_documents(documents=chunks, embedding=OpenAIEmbeddings(chunk_size=1000))
    
    memory = ConversationBufferWindowMemory(memory_key="history", return_messages=True, k=5)
    
    retrieval_chain = RetrievalQA.from_chain_type(
        llm=model,
        chain_type="stuff",
        retriever=vector_db.as_retriever(),
        memory=memory
    )
    return retrieval_chain

def process_feedback(user_rating, emoji_icon=None):
    st.toast(f"Rating received: {user_rating}", icon=emoji_icon)
    return user_rating.update({"extra_data": 456})

def save_user_feedback():
    rating_data = st.session_state.get("user_rating")
    current_run_id = st.session_state.current_run
    
    rating_scale = {
        "šŸ‘": 1.0,
        "šŸ‘Ž": 0.0,
        "😊": 0.8,
        "šŸ˜”": 0.2
    }
    
    final_score = rating_scale.get(rating_data.get("score"))
    
    if final_score is not None:
        feedback_category = f"thumbs {rating_data.get('score')}"
        try:
            saved_feedback = langsmith_client.create_feedback(
                run_id=current_run_id,
                feedback_type=feedback_category,
                score=final_score
            )
            st.session_state.saved_rating = {
                "rating_id": str(saved_feedback.id),
                "final_score": final_score
            }
            st.success(f"Feedback saved: {saved_feedback.id}")
        except Exception as error:
            st.error(f"Could not save feedback: {error}")
    else:
        st.warning("Please provide valid feedback")

langsmith_client = Client()
qa_system = build_qa_system()

if "chat_history" not in st.session_state:
    st.session_state["chat_history"] = []

for msg in st.session_state.chat_history:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

user_question = st.chat_input("Ask me anything...")

if user_question:
    st.session_state.chat_history.append({"role": "user", "content": user_question})
    
    with st.chat_message("user"):
        st.markdown(user_question)
    
    with collect_runs() as run_collector:
        bot_response = qa_system({"query": user_question})
        if run_collector.traced_runs:
            current_run = run_collector.traced_runs[0].id
            st.session_state.current_run = current_run
        else:
            st.error("Failed to collect run data")
            current_run = None
    
    bot_answer = bot_response['result']
    st.session_state.chat_history.append({"role": "assistant", "content": bot_answer})
    
    with st.chat_message("assistant"):
        st.markdown(bot_answer)
    
    if bot_answer:
        user_rating = streamlit_feedback(
            feedback_type="thumbs",
            key=f"rating_{current_run}"
        )
        if user_rating:
            st.session_state["user_rating"] = user_rating
            save_user_feedback()

I’ve hit this exact problem - it’s usually timing issues. Your feedback widget generates new keys each time, but the run ID matching gets screwed up.

Honestly, you’re overcomplicating this. Building all these moving parts when you could just automate the whole thing.

I switched to Latenode for chatbot feedback tracking. It handles LangSmith integration automatically. Set up one workflow that grabs user interactions, processes feedback, and pushes everything to LangSmith. No more session state headaches or run ID mismatches.

Latenode automation does:

  • Captures chat interactions
  • Stores feedback instantly when users click
  • Real-time sync to LangSmith
  • Auto error handling

No debugging session issues. No wondering why feedback vanishes. It just runs.

Your code logic is fine, but Streamlit’s session management sucks for feedback tracking. Breaks when users refresh or open multiple tabs.

Check Latenode’s LangSmith integrations: https://latenode.com

The problem is you’re calling save_user_feedback() right when the feedback widget returns a value, but the run probably isn’t committed to LangSmith yet. Had the same issue with a RAG chatbot last year.

Switch to async. Don’t call save_user_feedback() directly in the widget callback - store the feedback temporarily and process it separately. The streamlit_feedback widget can fire multiple times during rendering, causing race conditions.

What fixed it for me was checking if the run actually exists in LangSmith before submitting feedback:

def save_user_feedback():
    # Check if run exists first
    try:
        langsmith_client.read_run(st.session_state.current_run)
    except:
        st.warning("Run not ready yet, try again")
        return
    
    # Then proceed with feedback creation

Also check your API key has write permissions for feedback. Some LangSmith setups use read-only keys that can track runs but can’t create feedback entries. Verify your key permissions in LangSmith settings.

Your problem is the feedback widget key generation mixed with session state handling. Using key=f"rating_{current_run}" with a dynamic run ID makes Streamlit recreate the widget every time, breaking the feedback callback. I hit this exact issue building a document Q&A system. Fixed it by switching to static keys and managing the run-feedback mapping properly:

# Use message index instead of run ID for widget key
feedback_key = f"feedback_{len(st.session_state.chat_history)//2}"

user_rating = streamlit_feedback(
    feedback_type="thumbs",
    key=feedback_key,
    on_submit=save_user_feedback  # Use callback instead of direct call
)

Store the run ID mapping separately:

if "run_mappings" not in st.session_state:
    st.session_state["run_mappings"] = {}

# After getting the run ID
st.session_state.run_mappings[feedback_key] = current_run

This keeps the widget key consistent across reruns, so you can grab the right run ID when feedback gets submitted. Your current approach spawns new widgets constantly, so feedback never links up properly.

Been there. You’re trying to create feedback before LangSmith’s backend finishes processing the run.

Hit this exact issue 6 months ago with a customer support bot. collect_runs() gives you the run ID right away, but LangSmith needs a few seconds to index that run before it’ll accept feedback.

Just add a small delay before calling create_feedback:

import time

def save_user_feedback():
    rating_data = st.session_state.get("user_rating")
    current_run_id = st.session_state.current_run
    
    # Give LangSmith time to process the run
    time.sleep(2)
    
    rating_scale = {
        "šŸ‘": 1.0,
        "šŸ‘Ž": 0.0,
        "😊": 0.8,
        "šŸ˜”": 0.2
    }
    # rest of your code...

Also double-check your LANGCHAIN_PROJECT environment variable matches what’s in the dashboard. I’ve seen feedback go to the wrong project and vanish.

Throw in some debug logging too:

print(f"Sending feedback for run: {current_run_id}")

Then verify that run ID actually exists in your LangSmith project before making the feedback call.