LangSmith not recording user feedback from Streamlit chatbot application

I have a Streamlit chatbot that utilizes OpenAI and Langchain with LangSmith for tracking user interactions. I implemented a feedback feature where users can rate the chatbot’s responses using emoji faces. However, the feedback data does not seem to be getting recorded in LangSmith, despite other tracking working correctly. The feedback button is visible and functions well, but nothing is saved to LangSmith. What might be the cause of this issue?

import os
import streamlit as st
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import FAISS
from langchain.chat_models import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.memory import ConversationBufferWindowMemory
from langchain.document_loaders import DirectoryLoader
from langchain.callbacks import collect_runs
from langsmith import Client
from streamlit_feedback import streamlit_feedback
from dotenv import load_dotenv

# Initialize environment
load_dotenv()

@st.cache_resource
def build_qa_system():
    model = ChatOpenAI(model='gpt-3.5-turbo', temperature=0.1, openai_api_key=os.environ['OPENAI_API_KEY'])
    
    # Load and process documents
    loader = DirectoryLoader('data/')
    splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    chunks = splitter.split_documents(loader.load())
    
    # Create embeddings and vector store
    embeddings = OpenAIEmbeddings(chunk_size=1000)
    vector_db = FAISS.from_documents(documents=chunks, embedding=embeddings)
    
    # Setup conversation memory
    chat_memory = ConversationBufferWindowMemory(k=5, memory_key="history", return_messages=True)
    
    # Build QA chain
    qa_system = RetrievalQA.from_chain_type(
        llm=model,
        chain_type="stuff",
        retriever=vector_db.as_retriever(),
        memory=chat_memory
    )
    return qa_system

def process_user_feedback():
    user_rating = st.session_state.get("rating_data")
    current_run_id = st.session_state.current_run_id
    
    rating_values = {
        "👍": 1.0,
        "👌": 0.8,
        "😑": 0.5,
        "👎": 0.2,
        "😡": 0.0,
    }
    
    rating_score = rating_values.get(user_rating.get("score"))
    
    if rating_score is not None:
        feedback_label = f"user_rating {user_rating.get('score')}"
        try:
            feedback_entry = langsmith_client.create_feedback(
                run_id=current_run_id,
                feedback_type=feedback_label,
                score=rating_score
            )
            st.session_state.user_feedback = {
                "entry_id": str(feedback_entry.id),
                "rating": rating_score,
            }
            st.success(f"Rating saved: {feedback_entry.id}")
        except Exception as error:
            st.error(f"Rating failed to save: {error}")
    else:
        st.warning("Please select a valid rating.")

langsmith_client = Client()
qa_chain = build_qa_system()

# Initialize chat messages
if "chat_history" not in st.session_state:
    st.session_state["chat_history"] = []

# Show previous messages
for msg in st.session_state.chat_history:
    with st.chat_message(msg["role"]):
        st.markdown(msg["content"])

query = st.chat_input("Ask your question...")

if query:
    st.session_state.chat_history.append({"role": "user", "content": query})
    with st.chat_message("user"):
        st.markdown(query)

    # Process query and get response
    with collect_runs() as run_collector:
        result = qa_chain({"query": query})
        if run_collector.traced_runs:
            current_run = run_collector.traced_runs[0].id
            st.session_state.current_run_id = current_run
        else:
            st.error("Failed to collect run data")
            current_run = None

    bot_response = result['result']

    st.session_state.chat_history.append({"role": "assistant", "content": bot_response})
    with st.chat_message("assistant"):
        st.markdown(bot_response)
    
    # Show feedback interface
    if bot_response:
        user_rating = streamlit_feedback(
            feedback_type="faces",
            key=f"rating_{current_run}"
        )
        if user_rating:
            st.session_state["rating_data"] = user_rating
            process_user_feedback()

Been through this pain before. Your code looks fine but you’re missing something crucial - RetrievalQA doesn’t properly pass run IDs through its internal steps.

The problem is collect_runs() captures the outer chain execution, but LangSmith needs the actual LLM call run ID for feedback. RetrievalQA wraps multiple steps and the run collector grabs the wrong one.

Try this instead:

from langchain.callbacks import LangChainTracer

# Replace your collect_runs section
tracer = LangChainTracer(project_name=os.getenv('LANGCHAIN_PROJECT'))
result = qa_chain({"query": query}, callbacks=[tracer])

# Get the actual LLM run from the tracer
if tracer.runs:
    # Find the ChatOpenAI run specifically
    llm_run = next((run for run in tracer.runs if 'ChatOpenAI' in str(type(run))), None)
    if llm_run:
        st.session_state.current_run_id = llm_run.id

Also check your environment setup. I’ve seen feedback silently fail when LANGCHAIN_TRACING_V2 isn’t set to “true” or when the project name doesn’t match exactly.

One more thing - add debug logging in your feedback function. LangSmith sometimes returns success but the feedback doesn’t show up due to project permissions or API rate limits.

Your issue is with the collect_runs() implementation and how you’re accessing the run ID. The collect_runs context isn’t properly capturing the nested chain calls in your RetrievalQA system. I’ve hit this before with complex chains - the run collector misses the actual LLM call and grabs a parent run instead. Switch to manual run ID capture using callbacks directly on the chain: ```python
from langchain.callbacks import StdOutCallbackHandler
from langsmith.run_helpers import tracing_v2_enabled

Replace collect_runs with direct callback

with tracing_v2_enabled() as tracer:
result = qa_chain({“query”: query}, callbacks=[tracer])
run_id = tracer.get_run_id() if hasattr(tracer, ‘get_run_id’) else None

Also check your LangSmith client initialization has the correct project name set. If you're not setting it explicitly, feedback might go to the wrong project. Add `project_name="your_project"` to your Client() initialization. One more thing - make sure you're not in local dev mode that's disconnected from the right LangSmith environment. Check your `.env` file has the correct `LANGCHAIN_PROJECT` variable set.

Had this exact problem a few months ago with a similar setup. Your timing’s off with the run collection.

Your collect_runs() context manager only captures the QA chain execution, but LangSmith needs the run fully closed before you can attach feedback. The run’s probably still pending when you try to create feedback.

Here’s the fix - move feedback collection outside the run context and add a small delay:

with collect_runs() as run_collector:
    result = qa_chain({"query": query})

# Wait for run to close properly
time.sleep(0.5)

if run_collector.traced_runs:
    current_run = run_collector.traced_runs[0].id
    st.session_state.current_run_id = current_run

Check your LangSmith project settings too. Feedback recording gets disabled by default sometimes.

One more thing - make sure you’re using the right feedback key format. LangSmith’s picky about the feedback_type parameter. Try just “rating” instead of the dynamic label.

This video covers debugging LangSmith tracking issues really well:

If the delay doesn’t work, log the run_id right before creating feedback to make sure you’re actually capturing a valid ID.

Had this exact problem with a chatbot I built for our internal docs. LangSmith feedback tracking becomes a mess with Streamlit’s rerun behavior.

You’re fighting Streamlit’s architecture. Every feedback button click reruns the entire app and trashes your session state. LangSmith’s callbacks don’t work well with this chaos.

I ditched the complex setup and moved to Latenode. Built a workflow that handles chat interactions, stores feedback properly, and syncs everything without session state headaches.

My Latenode setup: simple webhook receives chat query, calls OpenAI, stores response with unique ID. When feedback comes in, it updates the record directly. No more run collectors or timing issues.

Flow: Streamlit sends query → Latenode processes with OpenAI → returns response with tracking ID → user gives feedback → Latenode logs immediately.

Much cleaner than wrestling LangSmith callbacks with Streamlit’s lifecycle. You get proper logging and can export feedback data however you want.

Check it out: https://latenode.com

The problem’s in how you’re handling the feedback callback with Streamlit’s rerun cycle. When streamlit_feedback gets clicked, it triggers a page rerun, but your process_user_feedback() function runs during that rerun when the run_id might not be accessible anymore. I’ve seen this mess up because Streamlit’s session state gets cleared or changed between runs. Store the run_id more persistently and validate it exists before creating feedback: python if user_rating and st.session_state.get('current_run_id'): # Add validation here if st.session_state.current_run_id and len(str(st.session_state.current_run_id)) > 10: st.session_state["rating_data"] = user_rating process_user_feedback() Also check that your LangSmith project has feedback enabled in the dashboard settings. Sometimes the API calls work but feedback collection’s disabled at the project level. Open your browser dev tools and check the network tab to see if the feedback API calls are actually happening and what responses you’re getting.

Streamlit reruns destroy langsmith’s run state tracking. Your feedback function executes after the run context dies. Store the actual run data in st.session_state right when you collect it - don’t just save the ID. Double-check your langsmith project key in env vars too. I’ve watched feedback disappear into nothing when LANGCHAIN_ENDPOINT or LANGSMITH_API_KEY point to the wrong environment. Quick fix: print that run_id right before calling create_feedback to verify it’s actually valid.