I need help creating a comprehensive RAG solution that can handle different types of content like text documents, pictures, graphs, and spreadsheets. My files come in various formats including Word documents, Excel files, and PDFs.
What I need the system to do:
- Return image-only responses when needed
- Provide highly precise text answers for procedural information
- Maintain logical flow even when perfect accuracy isn’t critical
Current document challenges:
- Simple text files in Word and Excel formats work fine. I just need to optimize my embedding approach and language model settings like token size, overlap percentage, and other parameters
- Complex files with mixed content (Word docs with pictures, PDFs containing graphs and data tables) are giving me trouble. I haven’t figured out a unified approach yet
My setup includes:
- Local language models (Llama 3.1 13B Instruct, Qwen2-7B-Instruct)
Can someone help me design a complete workflow that addresses these different content types? I’m looking for practical guidance on building this kind of multi-format RAG architecture.