Building RAG application with C# Semantic Kernel using offline documents

I’m just starting out with Microsoft Semantic Kernel and need help creating a document-based RAG system that works with local files instead of cloud services.

I want to build an application that can read and process documents stored on my machine. I’ve seen examples that use OpenAI integration with KernelMemoryBuilder and WithOpenAIDefaults, but I need something that works entirely offline.

Can anyone share code examples or point me toward the right approach for implementing RAG with local file processing? I’m particularly interested in how to set up the memory builder without relying on external API calls.

I’ve looked at some Microsoft documentation and GitHub examples, but most of them seem to require OpenAI connections. What I need is a purely local solution that can index and search through my own document collection.

Any guidance on the correct classes and methods to use would be really helpful. Thanks in advance!

Just finished this exact setup for our internal docs at work. Skipped Semantic Kernel completely since we needed everything offline - went with a much simpler approach.

Used LangChain.NET with Ollama running locally. Ollama handles embeddings through their API without sending anything external. You can run models like nomic-embed-text locally for document embedding.

Vector store is Chroma in a Docker container on the same machine. Dead simple to set up and query.

Workflow’s straightforward - parse docs with iText7 for PDFs, chunk into smaller pieces, generate embeddings with your local Ollama instance, store everything in Chroma.

For queries, embed the question the same way and search for similar chunks. Works great and stays completely offline.

This video helped a ton when setting up the local LLM part:

Whole setup takes maybe 2 hours and performs way better than hacking Semantic Kernel to work offline. Plus you get way more control over chunking strategy.

Hit this same issue last year building a document search system for our team. Offline requirements killed most standard options.

Tried several solutions and ditched traditional coding completely. Managing local embeddings, vector databases, and document parsing ate up way too much dev time.

Ended up automating the whole RAG pipeline with Latenode instead. You can build workflows that process local docs, create embeddings with local models, and handle search without writing mountains of C# boilerplate.

Latenode handles everything - file monitoring, document chunking, vector storage. Drop new docs in your folders and it updates the knowledge base automatically.

Saved around 3 weeks of dev time and got a more solid system that catches edge cases I would’ve missed.

Check it out: https://latenode.com

I’ve been using Semantic Kernel for 8 months and hit the same offline problem. The trick is swapping out OpenAI for local embedding models. Configure your KernelMemoryBuilder with a local embedding service - I went with ONNX Runtime plus sentence-transformers models. The all-MiniLM-L6-v2 model works great for document similarity. For vector storage, SQLite with the vector extension does local indexing without any external stuff. Here’s the flow: chunk your files into smaller pieces, generate embeddings locally, store them in your vector database. When searching, generate embeddings for your search terms with the same local model and run similarity searches. Yeah, it’s slower than cloud solutions, but it’s your only option for sensitive docs or air-gapped setups. Just heads up - memory usage gets heavy with larger document collections.