Best approach for keeping recursive totals updated in MySQL tree structure?

I’m working with a hierarchical data structure in MySQL where I need to maintain calculated totals. Here’s my table setup:

CREATE TABLE nodes (
    id INT,
    sum_value INT,
    PRIMARY KEY (id)
);

CREATE TABLE hierarchy (
    parent_id INT,
    child_id INT,
    FOREIGN KEY (parent_id, child_id) REFERENCES nodes (id, id)
);

The setup works like this: leaf nodes have their sum_value set manually, but parent nodes should always contain the sum of all their children’s values. I’ve been trying this query to recalculate everything:

UPDATE nodes SET sum_value = (
    SELECT SUM(child.sum_value) FROM
        hierarchy JOIN nodes AS child
        ON hierarchy.child_id = child.id
        WHERE hierarchy.parent_id = nodes.id
)
WHERE EXISTS (
    SELECT * FROM hierarchy WHERE parent_id = nodes.id
);

The main challenge is keeping these totals accurate when data changes. I need to handle updates to leaf values and moving nodes around while keeping the tree structure intact.

What’s the most efficient way to maintain these recursive totals without recalculating everything each time?

I’ve considered a few options but each has problems:

  • Recalculating everything after each change (too slow)
  • Using triggers to update parent nodes automatically (MySQL limitations make this tricky)
  • Creating a queue system to process updates in batches (complex to implement)

I’m looking for a solution that could work for similar aggregation scenarios too. Any ideas on the best approach here?

I encountered a similar challenge in a project management application where we needed to track budgets through various tasks. The most effective solution involved implementing a dirty flag system paired with level-based updates. By introducing a dirty_flag column in the nodes table, I marked a node as dirty when its leaf value changed, and recursively flagged all ancestors. Subsequently, we scheduled a job to process these dirty nodes from the deepest level back to the root. This method proved far more efficient than recalculating totals entirely and circumvents the complications of triggers. The key is to only recalculate the path from the altered leaf up to the root instead of the entire tree. For frequent updates, batching the flagging and processing updates every few minutes rather than instantly can also enhance performance.

have you considered using materialized paths or nested sets? adjacency lists can be a real pain for aggregation. nested sets allow for much faster sum calculations with left/right boundaries. i switched up my schema from something similar and noticed big improvements in performance.