Best Practices for Saving and Loading Vector Embeddings in S3 for RAG-based Chat Applications

omarR_85 · August 8, 2025, 3:33pm

I’m building a chatbot using RAG architecture with vector embeddings. My app has two parts: one where admins upload documents and create vectors, and another where users chat with the bot.

Right now I’m having trouble with saving the vector data to AWS S3 and loading it back when needed. The main issue is with converting the vectors to a format that can be stored and then retrieved properly.

I tried using SQLite to store the serialized vectors but running into problems. Should I stick with this approach or look at other options like FAISS? What’s the recommended way to handle vector storage in cloud environments?

Here’s my current approach for database creation:

def setup_vector_database(connection, vector_data):
    cursor = connection.cursor()
    cursor.execute('''CREATE TABLE IF NOT EXISTS vectors (
                        idx INTEGER PRIMARY KEY AUTOINCREMENT,
                        doc_reference TEXT,
                        vector_data BLOB
                    )''')
    
    for index, vector in enumerate(vector_data):
        cursor.execute("INSERT INTO vectors (doc_reference, vector_data) VALUES (?, ?)",
                       (f"doc_{index}", pickle.dumps(vector)))
    
    connection.commit()

And here’s how I process the documents:

def process_document_embeddings(s3_bucket, document_path, aws_region='us-east-1'):
    try:
        pdf_loader = AmazonTextractPDFLoader(f's3://{s3_bucket}/{document_path}', region_name=aws_region)
        raw_documents = pdf_loader.load()
        
        splitter = RecursiveCharacterTextSplitter(chunk_size=50000, chunk_overlap=5000)
        text_chunks = splitter.split_documents(raw_documents)
        
        embedding_model = OpenAIEmbeddings()
        vector_store = Chroma.from_documents(text_chunks, embedding_model, persist_directory="./vectors")
        
        temp_file = tempfile.NamedTemporaryFile(delete=False, suffix='.db')
        connection = sqlite3.connect(temp_file.name)
        
        setup_vector_database(connection, text_chunks)
        
        output_key = f"{os.path.splitext(document_path)[0]}.vectors.db"
        s3 = boto3.client('s3')
        s3.upload_file(temp_file.name, s3_bucket, output_key)
        
        return vector_store
    except Exception as error:
        print(f"Error processing document: {error}")

Any suggestions for better approaches would be helpful. I’m relatively new to vector databases and cloud storage patterns.

neonNautilus · August 18, 2025, 5:49pm

You’re overcomplicating this. Ditch the manual SQLite serialization and S3 uploads - automate everything instead.

I had this exact problem months ago. It’s not about SQLite vs FAISS. You need a workflow that handles document processing, vector generation, and storage without you touching it.

Build an automated system that:

Watches your S3 bucket for new docs
Runs them through your embedding pipeline
Stores vectors however you want
Feeds your chat interface

Right now you’ve got too many manual steps and things breaking. Every document upload means running separate processes and crossing your fingers.

I fixed this with automation workflows for the whole RAG pipeline. Document goes in, everything happens automatically - text extraction, chunking, embeddings, storage. The chat interface just queries results without dealing with the mess underneath.

You can keep Chroma and OpenAI embeddings. Just wrap them in automated workflows instead of doing everything by hand.

No more serialization headaches, way more reliable. Your admins don’t have to babysit the processing either.

ryanl · August 17, 2025, 2:53am

Been dealing with vector storage at scale for years. Your SQLite approach works but you’re overcomplicating it.

Main issue: you’re mixing storage strategies. Using both Chroma (has its own persistence) and SQLite. Just pick one.

Want to keep your current setup? Ditch the manual pickle serialization. Store text chunks and metadata in SQLite, keep the vector index separate:

# Store metadata only
cursor.execute("INSERT INTO documents (doc_id, chunk_text, chunk_index) VALUES (?, ?, ?)", 
               (doc_id, chunk.page_content, chunk_index))

# Let Chroma handle the vectors
vector_store = Chroma.from_documents(chunks, embeddings, persist_directory=temp_dir)

Then zip everything before uploading to S3. Way cleaner than blob serialization.

Honestly though, I’d just use FAISS with S3. We switched last year and it’s been rock solid. FAISS gives better performance and the files compress well for S3.

vector_store = FAISS.from_documents(chunks, embeddings)
vector_store.save_local(temp_dir)
# Upload the whole directory

For loading, download and use FAISS.load_local(). Much simpler than database connections and blob deserialization.

Your chunking looks good though. Those parameters work well for most documents.