Local LLMs in n8n + Ollama RAG setup: Not meeting expectations. Thoughts?

DancingFox · April 23, 2025, 5:38pm

I recently put together a basic n8n and Ollama setup with RAG. But I’m not happy with the results. Has anyone else had this experience?

The local LLMs from Ollama seem much weaker than OpenAI models for practical use. I’ve tried models like Qwen 2.5:14B and Llama 3.2 with 8k context. They don’t come close to GPT-4o-mini in performance.

Here are some issues I’m seeing:

Frequent hallucinations
Strange outputs
Spelling mistakes
Odd mixing of RAG results

I’ve tried tweaking prompts, adjusting context length, and changing temperature settings. Nothing seems to help much.

Are local LLMs just not ready for real-world use? Or am I missing something? Would love to hear if others have found ways to make them work better.

Sophia63 · April 30, 2025, 2:21pm

yeah, local LLMs can be hit or miss. I’ve played around with em too and sometimes they’re just not up to snuff. have u tried the newer models like Mixtral? they seem a bit better. also, tweaking the RAG setup can help - like using better chunking methods or filtering the docs more carefully. but yeah, for some stuff u might still need to stick with the big cloud models for now. keep experimenting tho, theyre improving fast!

JollyMusic3 · April 29, 2025, 6:32pm

I’ve encountered similar challenges with local LLMs and Ollama. While they’re improving, they’re not quite at the level of commercial cloud offerings yet. One approach that’s yielded better results for me is using larger models like Mixtral 8x7B or MPT-30B, though they require more computational resources. Another strategy is to implement a multi-stage pipeline, where you use one model for initial processing and another for refinement or fact-checking. This can help mitigate hallucinations and improve overall output quality. It’s also worth exploring different RAG techniques, such as hybrid search or re-ranking, to enhance relevance. Local LLMs are progressing rapidly, but for now, it may be necessary to weigh the trade-offs between performance and data privacy/cost considerations for each specific use case.

Tom_89Paint · April 26, 2025, 2:34pm

I’ve been experimenting with local LLMs and Ollama for a while now, and I can relate to your frustrations. While they’ve come a long way, they’re still not quite on par with OpenAI’s offerings for many tasks.

One thing that’s helped me is fine-tuning the models on domain-specific data. It takes some effort, but it can significantly improve performance for particular use cases. Also, I’ve found that carefully curating the RAG corpus and using techniques like semantic chunking can reduce hallucinations.

That said, you’re right that local LLMs often struggle with consistency and coherence compared to GPT-4. They’re improving rapidly, though. Have you tried the latest Mixtral or Yi models? In my experience, they perform noticeably better than earlier Llama variants.

Ultimately, whether local LLMs are ready for production depends on your specific requirements and tolerance for imperfection. For some applications, they’re already good enough. For others, we might need to wait a bit longer or stick with cloud-based solutions for now.