add nagios flag for task pileup in celery queue (!1016) · Merge requests · emfollow / gwcelery

Deep Chatterjee requested to merge deep.chatterjee/gwcelery:nagios-warning-for-pileup into main Nov 29, 2022

We have had multiple instances when main worker receives the igwn alerts, however, the tasks are not executed. They get piled up in redis, once gwcelery encounters a restart, it tries to catch up. The most recent incident can be found in this thread: https://chat.ligo.org/ligo/pl/aa4s54b5cjbd3n9r4utb6zhhdw This MR adds a nagios check to check the length of the celery queue in redis, and report if it exceeds a certain length, which is an indicator that tasks are likely getting piled up. Under normal operations of gwcelery, the length of the celery queue is less than ~10 (mostly 0).

Edited Nov 29, 2022 by Deep Chatterjee

add nagios flag for task pileup in celery queue

Merge request reports