I’m attempting to create an evaluation framework with LangSmith, but I’m facing challenges when my evaluation logic tries to interact with Streamlit session state variables.
Here’s a summary of the steps I’ve taken:
- Launched a basic Streamlit application integrated with LangSmith
- Developed a dataset for evaluation containing test questions and responses
- Programmed an evaluation function designed to update session state during the evaluation process
- Executed the evaluation using LangSmith’s ‘evaluate’ function
What I anticipated: The variable session_state.message_log should be modified during the execution of the evaluation function.
What I encountered: An error indicating that the session state variable is unreachable.
Here’s a streamlined version of my code:
import streamlit as st
from langsmith import Client, evaluate
from dotenv import load_dotenv
from langchain_openai import ChatOpenAI
from langchain_core.prompts.prompt import PromptTemplate
from langsmith.evaluation import LangChainStringEvaluator
import openai
load_dotenv()
# Initialize message log
if 'message_log' not in st.session_state:
st.session_state.message_log = []
client = Client()
# Setup evaluation dataset
test_dataset = "Assessment Dataset"
dataset = client.create_dataset(test_dataset)
client.create_examples(
inputs=[
{"query": "Explain machine learning"},
{"query": "What is artificial intelligence"},
{"query": "Define neural networks"},
],
outputs=[
{"response": "ML is a subset of AI that learns from data"},
{"response": "AI simulates human intelligence in machines"},
{"response": "Neural networks mimic brain structure for computation"},
],
dataset_id=dataset.id,
)
# Evaluation prompt
GRADING_TEMPLATE = """Grade this answer as an expert teacher.
Question: {query}
Correct answer: {response}
Student answer: {result}
Respond with PASS or FAIL:"""
grading_prompt = PromptTemplate(
input_variables=["query", "response", "result"],
template=GRADING_TEMPLATE
)
grading_model = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)
grader = LangChainStringEvaluator("qa", config={"llm": grading_model, "prompt": grading_prompt})
openai_client = openai.Client()
def generate_answer(query):
return openai_client.chat.completions.create(
model="gpt-4o-mini",
temperature=0,
messages=[
{"role": "system", "content": "Answer briefly and accurately."},
{"role": "user", "content": query}
],
).choices[0].message.content
def evaluation_function(inputs):
result = generate_answer(inputs["query"])
st.session_state.message_log.append(result) # This line causes the error
return {"result": result}
# Run evaluation
test_results = evaluate(
evaluation_function,
data=test_dataset,
evaluators=[grader],
experiment_prefix="test-run",
)
The error occurs when attempting to access st.session_state.message_log within the evaluation function. What can I do to resolve this issue?