I’m having trouble with my Azure DevOps setup. I made a monitor pipeline to automatically retry stages that fail in my main YAML pipeline. It uses a Service Hook and the Azure DevOps REST API.
The monitor pipeline finds the failed jobs and their stages as expected and sends a PATCH request to retry them. However, even though the API returns an HTTP 204 response, the stages do not rerun.
I’ve confirmed that the Build Service account has the necessary permissions to queue builds. This approach used to work for the PublishArtifact stage before, but now it fails for all stages, including custom ones like AKS_Mariner_N and AKS_Ubuntu_N.
Does anyone know why Azure DevOps would accept a retry request yet not rerun the stage? Are there conditions that silently block a retry, or is there a better method to auto-retry failed stages?
I’ve encountered similar issues with Azure DevOps pipelines. One potential cause could be resource constraints or concurrency limits. Check your project settings to ensure you haven’t hit any build or job limits. Another possibility is a mismatch between the API version you’re using and the one supported by your Azure DevOps instance. Try explicitly specifying the API version in your requests. Lastly, review any custom conditions or approvals set on your stages, as these might prevent automatic retries. If all else fails, consider implementing a custom extension or using Azure Functions for more granular control over pipeline retries.
Having worked extensively with Azure DevOps, I can suggest a few things to troubleshoot this issue. First, double-check the exact permissions of your Build Service account. Sometimes, it needs more than just the ability to queue builds. Ensure it has ‘Edit build pipeline’ permissions too.
Another aspect to investigate is the pipeline variables. If your stages use variables that are dynamically set during the pipeline run, these might not be properly carried over during a retry operation. This can cause silent failures.
Also, consider the timing of your retry attempts. If you’re trying to retry immediately after a failure, there might be a cooldown period that’s not immediately apparent. Try adding a short delay before initiating the retry.
Lastly, I’d recommend reviewing any pipeline decorators or policies in your project. These can sometimes interfere with retry operations without leaving obvious traces in the logs.
hey alice, have you checked the build logs for any hidden errors? sometimes the api can return success but the stage might not rerun due to other issues. also, try manually retriggering a stage to see if it works. if it does, the problem might be in your monitor pipeline’s implementation. good luck troubleshooting!