Chroma Vector Database with Langchain Fails to Store Beyond 99 Embeddings

Been there, done that. Same exact thing happened to me last year building a document search system for our internal knowledge base.

It’s not just batch limits. Chroma has this weird quirk where it silently truncates at 99 records when there’s a mismatch between embedding dimensions and what it expects. Check if your OpenAI embeddings are generating consistent dimensions for all chunks.

What worked for me was adding explicit error handling and using add_texts instead of from_documents. Try this:

db = Chroma(persist_directory=DB_PATH, embedding_function=OpenAIEmbeddings())
texts = [chunk.page_content for chunk in text_chunks]
metadata = [chunk.metadata for chunk in text_chunks]
db.add_texts(texts=texts, metadatas=metadata)

Run a quick test - try storing exactly 100 chunks first. If that works but 101 fails, you know it’s the batch limit. If 100 also gets truncated to 99, it’s definitely the dimension mismatch issue.

One more thing - check your SQLite database file permissions. Sometimes partial writes happen when the process doesn’t have full write access to the directory.