Direct OpenAI API Integration with ChromaDB for Vector Embeddings Without LangChain

aroberts · June 28, 2025, 8:26am

I’ve been working with OpenAIEmbeddings from LangChain to generate vector embeddings and store them in ChromaDB. My current setup looks like this:

vector_db = Chroma(persist_directory=DB_PATH, embedding_function=create_embedding_model())

Now I want to switch to using OpenAI’s API directly instead of going through LangChain. I’m making the API call like this:

openai_client = OpenAI()

api_response = openai_client.embeddings.create(
    model=embedding_model,
    input=[document_text]
)

The problem I’m running into is that OpenAI’s API needs text as input, but ChromaDB’s embedding_function parameter expects a function that doesn’t take text input directly.

I tried a workaround where I loop through my documents and embed them one by one:

for document in document_chunks:
    doc_embedding = generate_openai_embedding(document.content)
    document.metadata["embeddings"] = doc_embedding

But since ChromaDB has this embedding_function parameter built in, I’m thinking there should be a cleaner way to integrate OpenAI’s API directly for embedding generation and storage.

What’s the best approach to handle this compatibility issue and use OpenAI’s API directly with ChromaDB?

SwiftCoder15 · July 8, 2025, 1:20am

You can create a wrapper function that matches ChromaDB’s expected interface while using OpenAI’s API directly. ChromaDB’s embedding_function expects a callable that takes a list of texts and returns a list of embeddings. Here’s what worked for me when I made the same transition: ```python
def openai_embedding_function(texts):
openai_client = OpenAI()
response = openai_client.embeddings.create(
model=“text-embedding-3-small”,
input=texts
)
return [embedding.embedding for embedding in response.data]

vector_db = Chroma(
persist_directory=DB_PATH,
embedding_function=openai_embedding_function
)