How to store string data in Chroma vector database using Langchain

pixelPilot · August 23, 2025, 4:04pm

I’m working on a project where I need to save user questions and their responses into Chroma vector database. The problem I’m facing is that Langchain seems to only accept document format for storing data. I want to know if there’s a way to directly store plain text strings instead of converting them to documents first.

def save_qa_pair(user_question, bot_response):
    vector_embeddings = OpenAIEmbeddings()
    Chroma.from_documents(documents=user_question, embedding=vector_embeddings, persist_directory="./data/")
    Chroma.from_documents(documents=bot_response, embedding=vector_embeddings, persist_directory="./data/")

What I want to achieve is storing simple text like:

user_question: “How are you today?”
bot_response: “I’m doing great, thanks for asking!”

Any help would be really appreciated!

rustyGuitar9 · August 30, 2025, 5:10am

I had the same problem and automated the whole thing instead of manually wrapping strings every time.

My workflow takes any string, converts it to the right format, generates embeddings, and pushes everything to Chroma automatically. No more repetitive code.

# Now it's just:
def save_qa_pair(user_question, bot_response):
    # Trigger automated workflow
    workflow.execute({
        'question': user_question,
        'response': bot_response
    })

The automation does document creation, metadata assignment, batch processing, and error handling. I can process hundreds of QA pairs without touching code.

I also connected webhooks so new conversations get stored automatically. Everything runs in the background - I don’t think about string conversions anymore.

This saved me hours and made everything way more reliable. You can build similar automations here: https://latenode.com

Alex_Thunder · August 29, 2025, 4:49am

Honestly, just use add_texts() on your existing Chroma instance. Skip the Document objects entirely - pass your strings directly and Chroma handles everything. Way simpler than recreating the whole vectorstore each time.

emparker · August 28, 2025, 6:59pm

Been working with Chroma for a while now and hit the same issue building a customer support bot. The solutions above work, but here’s another approach.

If you need this to scale, create a persistent Chroma instance instead of recreating it every time. Your current code makes new instances each call - that gets expensive fast.

def save_qa_pair(user_question, bot_response):
    vector_embeddings = OpenAIEmbeddings()
    
    # Create once, reuse many times
    vectorstore = Chroma(
        persist_directory="./data/",
        embedding_function=vector_embeddings
    )
    
    # Add new texts to existing store
    vectorstore.add_texts(
        texts=[user_question, bot_response],
        metadatas=[{"type": "question", "timestamp": time.time()}, 
                   {"type": "response", "timestamp": time.time()}]
    )
    
    vectorstore.persist()

This way you’re appending to your existing database instead of rebuilding it. Way faster with thousands of QA pairs. Learned this the hard way after waiting 10 minutes for my database to rebuild every time.

Throwing in timestamps in metadata helps with debugging later when you need to trace conversations.

alexlee · August 27, 2025, 2:15am

You’re on the right track - there’s an easy fix for this. Chroma needs Document objects, but you can wrap your strings without any hassle using Langchain’s Document class.

Here’s how:

from langchain.schema import Document

def save_qa_pair(user_question, bot_response):
    vector_embeddings = OpenAIEmbeddings()
    
    # Convert strings to Document objects
    question_doc = Document(page_content=user_question, metadata={"type": "question"})
    response_doc = Document(page_content=bot_response, metadata={"type": "response"})
    
    # Store in Chroma
    vectorstore = Chroma.from_documents(
        documents=[question_doc, response_doc], 
        embedding=vector_embeddings, 
        persist_directory="./data/"
    )

The metadata’s optional but handy for filtering later. I’ve used this same pattern in multiple projects and it works great - keeps things simple while playing nice with Langchain’s framework.

Neo_Movies · August 26, 2025, 11:37am

There’s another approach that works great when you’re dealing with lots of string data. Skip creating individual Document objects and use Chroma’s from_texts method instead - it handles the conversion for you.

def save_qa_pair(user_question, bot_response):
    vector_embeddings = OpenAIEmbeddings()
    
    # Direct text storage
    vectorstore = Chroma.from_texts(
        texts=[user_question, bot_response],
        embedding=vector_embeddings,
        metadatas=[{"type": "question"}, {"type": "response"}],
        persist_directory="./data/"
    )

This is way cleaner since you don’t have to manually wrap everything in Document objects. I use this all the time for chat apps where I’m constantly feeding in plain text conversations. Same performance, but the code feels more natural when your source data’s already strings.