Brief dashboard outage
The entire https://grafana-nautilus.ligo.org server was live but inaccessible (throwing 503 errors) this morning. The main (i.e. managed by Nautilus themselves) Nautilus k8s grafana dashboard was also down. Other web services were fine - activemq, elasticsearch, kibana etc.
It is back now but it would be nice to:
- Get to the bottom of what happened - it looks like there's also a brief outage visible in the actual job history
- Have some monitoring / alerts to check that the system is available.