Skip to content

.gitlab-ci.yml: retry running jobs on some failure types

Karl Wette requested to merge ANU-CGA/lalsuite:gitlab-ci-retry into master

Description

All CI jobs are now retried at least once on certain types of job failures, using a default block which applies a retry option to all jobs.

The possible failure modes for retry:when are given here. I've chosen a selection of then which seem to match transient failure modes (e.g. a runner times out or gets stuck) rather than permanent failure modes (e.g. a script fails, a job is misconfigured, etc.)

This won't catch all possible failures; as @duncanmmacleod notes in #307 (closed) artifact uploads are counted as script errors, for example, and therefore won't be retried. This also won't retry failures that are due to the packaging (e.g. Conda can't access a server) rather than due to LALSuite code (e.g. a test fails).

Closes #307 (closed) and #407 (closed)

API Changes and Justification

Backwards Compatible Changes

  • This change introduces no API changes
  • This change adds new API calls

Backwards Incompatible Changes

  • This change modifies an existing API
  • This change removes an existing API

Review Status

@duncanmmacleod @adam-mercer I'm not able to test that this patch actually reruns jobs on the selected failure modes, as I can't manually generate those failures. But I have checked that the retry option is being applied, on a separate test branch, by adding script_failure as a failure mode, and then hacking a job to fail, and confirmed that the job is automatically re-run.

Merge request reports