CI: increase default:retry:max to 2 (!2512) · Merge requests · lscsoft / lalsuite

Detailed Description

LALSuite CI Jobs may run on Kubernetes runners at INFN-CNAF. These CI jobs run as "Pods" which appear to start, but then have to wait for resources, and seem to often time out while waiting; see here for a recent example. The CI jobs then fail and have to be manually restarted by the user.

Luckily these failures register as a runner system failure and so can be retried automatically, without also needlessly retrying CI jobs that fail due a script error (e.g. a bug). The LALSuite CI already retries jobs that fail for various system-related reasons (see here for the docs):

unknown_failure
stuck_or_timeout_failure
runner_system_failure
stale_schedule
archived_failure
scheduler_failure

Currently jobs that fail for these reasons are retried once (default:retry:max = 1). Because the Kubernetes runners seem to fail more often, this MR increases default:retry:max to 2, which is the maximum GitLab allows. This may reduce the number of times users have to manually restart jobs that have failed too many times.

API Changes

Please tick one of the following options:

These changes do not modify the API.
These changes do modify the API, and are backwards compatible.
These changes do modify the API, and are backwards incompatible.

For examples of changes that do not modify the API and/or are considered backwards (in)compatible, please see the contributing guide.

Justification for Backwards Incompatible Changes

n/a

Review Status

n/a

CI: increase default:retry:max to 2

Detailed Description

API Changes

Justification for Backwards Incompatible Changes

Review Status

Merge request reports