.gitlab-ci.yml: retry running jobs on some failure types
Description
All CI jobs are now retried at least once on certain types of job failures, using a default
block which applies a retry
option to all jobs.
The possible failure modes for retry:when
are given here. I've chosen a selection of then which seem to match transient failure modes (e.g. a runner times out or gets stuck) rather than permanent failure modes (e.g. a script fails, a job is misconfigured, etc.)
This won't catch all possible failures; as @duncanmmacleod notes in #307 (closed) artifact uploads are counted as script errors, for example, and therefore won't be retried. This also won't retry failures that are due to the packaging (e.g. Conda can't access a server) rather than due to LALSuite code (e.g. a test fails).
Closes #307 (closed) and #407 (closed)
API Changes and Justification
Backwards Compatible Changes
-
This change introduces no API changes -
This change adds new API calls
Backwards Incompatible Changes
-
This change modifies an existing API -
This change removes an existing API
Review Status
@duncanmmacleod @adam-mercer I'm not able to test that this patch actually reruns jobs on the selected failure modes, as I can't manually generate those failures. But I have checked that the retry
option is being applied, on a separate test branch, by adding script_failure
as a failure mode, and then hacking a job to fail, and confirmed that the job is automatically re-run.