Hi everyone, I need some guidance on picking the right tech stack for our project.
Background:
I work for a construction materials business and we have about 40 PDF documents containing product specifications, thermal values, safety ratings and other technical info. Right now when customers call with questions, our support staff has to manually look through all these files to find answers. Our management team wants us to create an automated system using AI that can quickly respond to customer inquiries.
Current dilemma:
I’ve been doing research for the past month and there are so many different solutions available. Every article I read suggests different tools:
- Pinecone (costs more but reliable)
- ChromaDB (free but needs more setup)
- Weaviate (seems interesting but unfamiliar)
- Supabase pgvector (we use PostgreSQL already)
- Elasticsearch (we have some experience with it)
Our requirements:
- Currently 40 documents, might grow to 200+ with multiple languages
- Documents have complex layouts with charts and data tables
- Accuracy is critical (wrong information about safety specs could be dangerous)
- Small development team (just 2 people, no AI background)
- Budget around €50K first year
- Need working prototype in 6 months
Main questions:
- For technical documents with lots of structured data, should I consider visual processing approaches or stick with text extraction?
- Better to start simple with ChromaDB then upgrade later, or invest in Pinecone upfront?
- Anyone tried Weaviate for similar use cases?
- What kind of search speed should I expect with 40-200 documents?
What I’ve tested so far:
- Basic text extraction with ChromaDB locally (works okay but struggles with tabular data)
- Pinecone demo (good results but concerned about ongoing costs)
- Looked into multimodal approaches (interesting but might be too complex)
Looking for advice from people who have built similar document search systems. What approach would you recommend for a team without much AI experience? Any lessons learned or things to avoid?
Thanks for any insights!