Skip to content

Use more nuanced logic to determine what constitutes a Lasso failure

Alexander Urban requested to merge (removed):nagios-2 into master

This merge request enhances the logic used to determine what constitutes a Lasso failure:

  • If Lasso processes every lock stretch on a given day, or if there are no lock stretches to process, the result is PASS (green)
  • If at least one lock stretch fails but at least one also succeeds, the result is WARNING (yellow)
  • If all lock stretches fail, the result is CRITICAL (red)
  • If Lasso doesn't run in the expected time, the result is UNKNOWN (violet)

Note, prior to !35 (merged), we had a common situation where gwdetchar-lasso-correlation would run off into the weeds and stay in a run state for days on end without actually doing anything, which would have led to an UNKNOWN Nagios state. However, we now have a timeout condition where any jobs that take longer than half an hour to run are just killed, and this would lead to WARNING or CRITICAL.

cc @duncanmmacleod, @alexandra.macedo

Edited by Alexander Urban

Merge request reports