Direct OpenAI API Integration with ChromaDB for Vector Embeddings Without LangChain

I’ve been working with OpenAIEmbeddings from LangChain to generate vector embeddings and store them in ChromaDB. My current setup looks like this:

vector_db = Chroma(persist_directory=DB_PATH, embedding_function=create_embedding_model())

Now I want to switch to using OpenAI’s API directly instead of going through LangChain. I’m making the API call like this:

openai_client = OpenAI()

api_response = openai_client.embeddings.create(
    model=embedding_model,
    input=[document_text]
)

The problem I’m running into is that OpenAI’s API needs text as input, but ChromaDB’s embedding_function parameter expects a function that doesn’t take text input directly.

I tried a workaround where I loop through my documents and embed them one by one:

for document in document_chunks:
    doc_embedding = generate_openai_embedding(document.content)
    document.metadata["embeddings"] = doc_embedding

But since ChromaDB has this embedding_function parameter built in, I’m thinking there should be a cleaner way to integrate OpenAI’s API directly for embedding generation and storage.

What’s the best approach to handle this compatibility issue and use OpenAI’s API directly with ChromaDB?

You can create a wrapper function that matches ChromaDB’s expected interface while using OpenAI’s API directly. ChromaDB’s embedding_function expects a callable that takes a list of texts and returns a list of embeddings. Here’s what worked for me when I made the same transition: ```python
def openai_embedding_function(texts):
openai_client = OpenAI()
response = openai_client.embeddings.create(
model=“text-embedding-3-small”,
input=texts
)
return [embedding.embedding for embedding in response.data]

vector_db = Chroma(
persist_directory=DB_PATH,
embedding_function=openai_embedding_function
)