Skip to content

add nagios flag for task pileup in celery queue

We have had multiple instances when main worker receives the igwn alerts, however, the tasks are not executed. They get piled up in redis, once gwcelery encounters a restart, it tries to catch up. The most recent incident can be found in this thread: https://chat.ligo.org/ligo/pl/aa4s54b5cjbd3n9r4utb6zhhdw This MR adds a nagios check to check the length of the celery queue in redis, and report if it exceeds a certain length, which is an indicator that tasks are likely getting piled up. Under normal operations of gwcelery, the length of the celery queue is less than ~10 (mostly 0).

Edited by Deep Chatterjee

Merge request reports