Skip to content

CI: retry running jobs on all failure types

Karl Wette requested to merge ANU-CGA/lalsuite:retry-all-jobs into master

Description

It's relatively common for CI jobs to fall over at various stages; for that reason .gitlab-ci.yml includes a retry block with a number of possible failure modes where jobs will be retried. That list however doesn't cover all failure modes, such as script_failure.

It's quite possible, however, for a script to fail due to a failure in the package building infrastructure (RPM/Deb/Conda) and not due to a failure in compiling LALSuite code. Indeed it's impossible for GitLab distinguish these 2 failure modes. My experience is that the former is more common that the latter. Given that the full CI pipeline takes hours to run, a developer would most likely run make, make check on their local machine first, which should catch the majority of compilation failures (modulo rarer/corner cases different platforms/compilers/etc.) So when CI jobs fail it's more likely to be a package building infrastructure failure than a LALSuite compilation failure.

This MR simplifies the retry block in .gitlab-ci.yml by retrying all failed CI jobs at least once. This would benefit developers, as it would lead to having fewer Failed pipeline emails from GitLab, and having to log in to manually retry jobs. This is unlikely to add to the burden on the GitLab runners; most jobs that need re-running are due to transient failures, and so would need to be re-run anyway. (!1854 (merged) would also help reduce GitLab runner burden, by not running jobs when prerequisites have failed.)

API Changes and Justification

Backwards Compatible Changes

  • This change does not modify any class/function/struct/type definitions in a public C header file or any Python class/function definitions
  • This change adds new classes/functions/structs/types to a public C header file or Python module

Backwards Incompatible Changes

  • This change modifies an existing class/function/struct/type definition in a public C header file or Python module
  • This change removes an existing class/function/struct/type from a public C header file or Python module

Review Status

cc @adam-mercer

Edited by Karl Wette

Merge request reports