I’ve been working with OpenAIEmbeddings
from LangChain to generate vector embeddings and store them in ChromaDB. My current setup looks like this:
vector_db = Chroma(persist_directory=DB_PATH, embedding_function=create_embedding_model())
Now I want to switch to using OpenAI’s API directly instead of going through LangChain. I’m making the API call like this:
openai_client = OpenAI()
api_response = openai_client.embeddings.create(
model=embedding_model,
input=[document_text]
)
The problem I’m running into is that OpenAI’s API needs text as input, but ChromaDB’s embedding_function
parameter expects a function that doesn’t take text input directly.
I tried a workaround where I loop through my documents and embed them one by one:
for document in document_chunks:
doc_embedding = generate_openai_embedding(document.content)
document.metadata["embeddings"] = doc_embedding
But since ChromaDB has this embedding_function
parameter built in, I’m thinking there should be a cleaner way to integrate OpenAI’s API directly for embedding generation and storage.
What’s the best approach to handle this compatibility issue and use OpenAI’s API directly with ChromaDB?