I set up autonomous agents where each has a single responsibility: one handles login and stores session artifacts, another navigates pages using that session, and a third extracts and normalizes data. The trick was making the session object explicit and small: cookies, a short-lived token, and a few verification selectors.
When agents ran into failures I preferred retrying the login agent or re-validating the stored token rather than letting downstream agents guess. I also added quick health checks between agents so the navigator would confirm the user is still signed in before extraction started.
Has anyone built a retry policy that gracefully re-authenticates without losing partially collected data?
i used a coordinator agent that watches for auth failures. it calls the login agent and then replays the last navigation step rather than starting over. keeping the last-known-good page in memory made resuming much faster and less error-prone.
for extraction I added a checkpoint after each page. the extractor writes results to a temporary store and commits only after a final verification. that way an auth retry doesn’t corrupt the final dataset.
In one project I needed to coordinate agents across many vendor sites with different auth lifetimes. I implemented a small orchestration layer that tracked each site’s token expiry and a queue of pending extraction tasks. When a token was expired or a 401 was received the orchestrator invoked the login agent and placed the failing task back in front with a retry count. Tasks carried a lightweight progress object so extraction could resume from the last completed item. To avoid duplication we made the extractor idempotent: each extracted row had a stable key and the commit step used upsert semantics. This approach kept partially collected data intact and limited rework while agents re-authenticated.