Hi there! I have been working with Langchain using embeddings from HuggingFace along with Qdrant as my vector database. The performance seems really poor though. When I tested it with just 100 documents, it took about 27 minutes to get everything stored in the database. I am running Qdrant on my local machine. My project needs to handle around 80k to 100k documents total. If I do the math, that would take hundreds of hours which is way too long. Has anyone found ways to make this faster? Any tips would be helpful since I am still learning this stuff.
Update: Thanks to everyone who helped out! Turns out the embedding process was the main bottleneck like many of you thought. After timing different parts and switching to different embeddings, things got much better.
Yeah, embedding bottleneck sounds right. Switching from heavy HuggingFace sentence transformers to all-MiniLM-L6-v2 cut my processing time way down. Also check your Qdrant collection settings - tweaking HNSW params during creation helped tons with bulk inserts. Try m=16 and ef_construct=200 for faster writes when loading data initially. You can optimize search performance later. And definitely use Qdrant’s async client if you’re not already - the sync version is painfully slow for bulk stuff. These changes brought my processing time down to something reasonable.
Performance bottlenecks like this are exactly why I ditched manual embeddings and database ops years ago. It’s a total nightmare with large document collections.
You need proper automation that handles batching, retry logic, and parallel processing without manual config hell. I’ve dealt with this same mess on even bigger collections.
Set up an automated pipeline that manages everything - embedding generation, database connections, error handling, auto-scaling based on your resources.
Skip weeks of tweaking LangChain configs and Qdrant settings. Get it running smoothly in hours instead. The automation figures out optimal batch sizes, manages memory, handles failures gracefully.
I’ve processed 200k+ documents this way with zero performance headaches. Let automation handle the complex orchestration while you focus on your actual project.
Check out Latenode for workflow automation: https://latenode.com
Glad to hear you’ve resolved the embedding bottleneck! To further enhance performance, consider batching your documents during the embedding process; I typically group them into batches of 50 to 100 based on their size and available memory, which significantly speeds things up compared to processing them individually. Additionally, if you’re running Qdrant using Docker, adjusting the memory settings can help, as the default configurations are not optimized for bulk operations. Another effective strategy is to preprocess your documents by removing unnecessary whitespace and formatting, which has notably reduced my embedding time. Given the volume of your collections, exploring distributed processing across multiple machines, if feasible, might also be beneficial.
Nice work fixing the embedding issue! I’ve had good luck with smaller, faster embedding models - they’re way quicker and still decent quality. Also, check if you’re using all your CPU cores. LangChain doesn’t always parallelize by default, which is super annoying.