How to structure NoSQL database schema for nested content blocks like Notion

I’m working on a project similar to Notion, where users can create various types of content blocks that can also include other blocks. The challenge is that these blocks can be infinitely nested, and all changes need to save automatically and sync in real time for all users.

I’m exploring the best way to set this up in a NoSQL database like Firestore. Should I create individual documents for each block and connect them through references? I’m concerned about potential performance issues since this would require several database queries to retrieve all nested content.

Alternatively, I could store everything in a single document with nested objects, but that might become complicated with deep nesting. Has anyone faced this type of recursive data structure? What solution has worked best for you?

the single document approach isnt terrible if u dont nest too deep. i go with a flat structure - each block gets an id and parent_id, then i rebuild the tree client-side. perfect for realtime updates since yer only watching one collection instead of juggling a bunch of subcollections.

I’ve dealt with this exact problem in production. Storing block hierarchies as adjacency lists works best for most cases. Each block document has a parent_id and order field for positioning. The key insight? Add a denormalized path field that stores the full ancestry chain. This lets you fetch entire subtrees with a single prefix query while keeping flexibility for deep nesting. For real-time sync, maintain a separate collection for page-level operations that tracks which blocks changed and in what order. This prevents race conditions when multiple users edit simultaneously. The main gotcha is handling block moves between parents - you need to update all descendant paths atomically. Performance-wise, this scales way better than nested documents because you avoid Firestore’s 20MB document limit and can implement proper indexing strategies.

I built something similar last year using a hybrid approach that worked really well. Each block becomes its own document with parent/child relationships tracked through a path field - this keeps the hierarchy intact while letting you query all blocks under a page efficiently. For real-time sync, I used operational transforms at the block level rather than document level. Every block change creates an operation that gets shared with connected clients, and I batch operations during fast typing to avoid hammering the database. I also cache the block tree in memory on the client side and only fetch content when needed - this cut database calls way down and kept the UI snappy. Just query the root blocks first, then lazy load the nested stuff.