Skymap jobs silently stop responding to tasks
I observed an instance on production where flower claimed the skymap jobs were okay but nagios flagged the openmp queue being down. The skymaps were not being uploaded to gracedb, so the nagios check is working, but it's concerning that flower thought the jobs were okay. Simply holding and releasing the jobs fixed the problem.
As evidence, this is a superevent created about an hour before I wrote this. Compare the submitted timestamp here to the timestamps in the logs below.
Skymaps were not uploaded until I logged into the production machine and held/released the skymap jobs.
And here is a screenshot of the logs from the preferred event to make confirm that the skymap was not uploaded until the jobs were restarted