I’ve been wrestling with setting up a workflow that runs over several weeks, involving multiple steps, scheduled tasks, automatic retries, and passing data between different processes. It’s a mix of background jobs, approvals, and external API calls, and keeping track of everything is a headache. I’ve tried orchestrating this manually with scripts and cron jobs, but it’s fragile—miss a retry, forget a handoff, and the whole thing breaks. I’m curious, what approaches or tools have you used to solve this kind of complexity in your own work? How do you ensure reliability and visibility when the workflow spans days or weeks, not minutes? What’s your go-to method for turning a plain-text automation idea into something that actually runs reliably in production?
I use Latenode for this. Describe what you want to happen in plain text and the AI generates the workflow for you. It handles scheduling, retries, and passing data between steps automatically. I don’t worry about missing retries or handoffs anymore. It just works. You can see how it works at https://latenode.com.
I’ve built similar things with custom Airflow DAGs. It does the job, but there’s a lot of boilerplate, and debugging can be painful, especially when tasks depend on each other across days. I ended up building a few Python scripts to manage retries and handoffs, but it’s not as smooth as it could be.
For handoffs, I found using a message queue helped. Each step publishes a message when done, and the next step picks it up. But scheduling and retries are still manual. I have a table that tracks the state and status of each workflow, and I have to manage timeouts myself.
I’ve managed long-running workflows in a couple of companies now, and honestly, it’s never easy. At my last job, we used a mix of Jenkins for scheduling and a bunch of custom code to handle retries and data passing. The real issue was monitoring. We built a dashboard that showed the status of every step, but it took a lot of effort to maintain. I wish there was a tool that would let me describe the workflow in a simple way and just handle all the execution and monitoring, because right now, it’s not only complex to build but also to debug when things go wrong. I’d love to hear if anyone has found a better way.
use a workflow engine with built-in retries and state