Hey everyone! I’m wondering if there’s a way to create some kind of monitoring interface that shows when workflows succeed or fail, kind of like what you see in automation platforms.
Basically I want to track:
Which runs worked and which didn’t
What data went in and came out
Maybe have a retry button for failed attempts
I’m thinking something where I can quickly scan through recent activity and catch problems before they become bigger issues. Has anyone built something similar or know of existing solutions? I feel like this would save a lot of debugging time but maybe I’m missing an obvious approach.
have you tried using something like Grafana? or you could build a basic web dashboard with status cards. I used Flask, it logs workflow runs in a SQLite db. super handy to catch failures quickly without browsing logs.
I built something similar with database logging and a React frontend. The trick is setting up webhook endpoints that catch workflow status changes as they happen. For retries, I saved the original input with each execution record - makes resubmitting failed jobs super easy. Pro tip: track execution time and resource usage from day one. You’ll need that data later for performance tuning. The monitoring really pays off when you’re running multiple workflows at once - failures can snowball fast if you don’t catch them early.
We went with structured logging plus a basic Django dashboard. Created a standard log format across all workflows - start/end times, input params, outputs, and error messages. A background job parses these logs every few minutes and updates our status table. The dashboard shows a timeline of recent runs with color-coded success/failure states. For retries, we just serialize the original workflow params as JSON in the DB record. Hit retry and it resubmits the exact same job config. Pro tip: track partial failures too - workflows that finish but give weird results. They’re sneaky but cause the biggest headaches later.