Strange igwn alert bootstep behavior
We were having issues on production today for an unknown reason. After re-deploying, this error appeared in the main worker logs.
[2023-12-24 13:04:02,661: WARNING/MainProcess/IGWNReceiverThread] Exception in thread
[2023-12-24 13:04:02,663: WARNING/MainProcess/IGWNReceiverThread] IGWNReceiverThread
[2023-12-24 13:04:02,663: WARNING/MainProcess/IGWNReceiverThread] :
[2023-12-24 13:04:02,665: WARNING/MainProcess/IGWNReceiverThread] Traceback (most recent call last):
[2023-12-24 13:04:02,666: WARNING/MainProcess/IGWNReceiverThread] File "/cvmfs/software.igwn.org/conda/envs/igwn-py39-20221118/lib/python3.9/threading.py", line 980, in _bootstrap_inner
[2023-12-24 13:04:02,668: WARNING/MainProcess/IGWNReceiverThread]
[2023-12-24 13:04:02,669: WARNING/MainProcess/IGWNReceiverThread] self.run()
[2023-12-24 13:04:02,670: WARNING/MainProcess/IGWNReceiverThread] File "/home/emfollow/.local/lib/python3.9/site-packages/sentry_sdk/integrations/threading.py", line 72, in run
[2023-12-24 13:04:02,672: WARNING/MainProcess/IGWNReceiverThread]
[2023-12-24 13:04:02,673: WARNING/MainProcess/IGWNReceiverThread] reraise(*_capture_exception())
[2023-12-24 13:04:02,674: WARNING/MainProcess/IGWNReceiverThread] File "/home/emfollow/.local/lib/python3.9/site-packages/sentry_sdk/_compat.py", line 60, in reraise
[2023-12-24 13:04:02,676: WARNING/MainProcess/IGWNReceiverThread]
[2023-12-24 13:04:02,677: WARNING/MainProcess/IGWNReceiverThread] raise value
[2023-12-24 13:04:02,678: WARNING/MainProcess/IGWNReceiverThread] File "/home/emfollow/.local/lib/python3.9/site-packages/sentry_sdk/integrations/threading.py", line 70, in run
[2023-12-24 13:04:02,679: WARNING/MainProcess/IGWNReceiverThread]
[2023-12-24 13:04:02,679: WARNING/MainProcess/IGWNReceiverThread] return old_run_func(self, *a, **kw)
[2023-12-24 13:04:02,680: WARNING/MainProcess/IGWNReceiverThread] File "/cvmfs/software.igwn.org/conda/envs/igwn-py39-20221118/lib/python3.9/threading.py", line 917, in run
[2023-12-24 13:04:02,682: WARNING/MainProcess/IGWNReceiverThread]
[2023-12-24 13:04:02,683: WARNING/MainProcess/IGWNReceiverThread] self._target(*self._args, **self._kwargs)
[2023-12-24 13:04:02,683: WARNING/MainProcess/IGWNReceiverThread] File "/home/emfollow/.local/lib/python3.9/site-packages/gwcelery/igwn_alert/bootsteps.py", line 39, in listen
[2023-12-24 13:04:02,684: WARNING/MainProcess/IGWNReceiverThread]
[2023-12-24 13:04:02,685: WARNING/MainProcess/IGWNReceiverThread] self.stream_obj = self.open(self._construct_topic_url(topics), "r") # noqa: E501
[2023-12-24 13:04:02,685: WARNING/MainProcess/IGWNReceiverThread] File "/home/emfollow/.local/lib/python3.9/site-packages/hop/io.py", line 120, in open
[2023-12-24 13:04:02,686: WARNING/MainProcess/IGWNReceiverThread]
[2023-12-24 13:04:02,687: WARNING/MainProcess/IGWNReceiverThread] return Consumer(
[2023-12-24 13:04:02,688: WARNING/MainProcess/IGWNReceiverThread] File "/home/emfollow/.local/lib/python3.9/site-packages/hop/io.py", line 320, in __init__
[2023-12-24 13:04:02,689: WARNING/MainProcess/IGWNReceiverThread]
[2023-12-24 13:04:02,690: WARNING/MainProcess/IGWNReceiverThread] self._consumer.subscribe(topics)
[2023-12-24 13:04:02,690: WARNING/MainProcess/IGWNReceiverThread] File "/home/emfollow/.local/lib/python3.9/site-packages/adc/consumer.py", line 48, in subscribe
[2023-12-24 13:04:02,691: WARNING/MainProcess/IGWNReceiverThread]
[2023-12-24 13:04:02,692: WARNING/MainProcess/IGWNReceiverThread] topic_meta = self.describe_topic(topic, timeout)
[2023-12-24 13:04:02,693: WARNING/MainProcess/IGWNReceiverThread] File "/home/emfollow/.local/lib/python3.9/site-packages/adc/consumer.py", line 70, in describe_topic
[2023-12-24 13:04:02,695: WARNING/MainProcess/IGWNReceiverThread]
[2023-12-24 13:04:02,695: WARNING/MainProcess/IGWNReceiverThread] cluster_meta = self._consumer.list_topics(timeout=timeout.total_seconds())
[2023-12-24 13:04:02,696: WARNING/MainProcess/IGWNReceiverThread] cimpl
[2023-12-24 13:04:02,697: WARNING/MainProcess/IGWNReceiverThread] .
[2023-12-24 13:04:02,698: WARNING/MainProcess/IGWNReceiverThread] KafkaException
[2023-12-24 13:04:02,699: WARNING/MainProcess/IGWNReceiverThread] :
[2023-12-24 13:04:02,700: WARNING/MainProcess/IGWNReceiverThread] KafkaError{code=_TRANSPORT,val=-195,str="Failed to get metadata: Local: Broker transport failure"}
Holding and releasing the job seemed to bring it back up without issue, but this error message appeared after holding it
Traceback (most recent call last):
File "/home/emfollow/.local/lib/python3.9/site-packages/celery/worker/worker.py", line 202, in start
self.blueprint.start(self)
File "/home/emfollow/.local/lib/python3.9/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/home/emfollow/.local/lib/python3.9/site-packages/celery/bootsteps.py", line 365, in start
return self.obj.start()
File "/home/emfollow/.local/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 336, in start
blueprint.start(self)
File "/home/emfollow/.local/lib/python3.9/site-packages/celery/bootsteps.py", line 116, in start
step.start(parent)
File "/home/emfollow/.local/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 726, in start
c.loop(*c.loop_args())
File "/home/emfollow/.local/lib/python3.9/site-packages/celery/worker/loops.py", line 86, in asynloop
state.maybe_shutdown()
File "/home/emfollow/.local/lib/python3.9/site-packages/celery/worker/state.py", line 93, in maybe_shutdown
raise WorkerShutdown(should_stop)
celery.exceptions.WorkerShutdown: 0
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/emfollow/.local/lib/python3.9/site-packages/celery/bootsteps.py", line 148, in send_all
fun(parent, *args)
File "/home/emfollow/.local/lib/python3.9/site-packages/gwcelery/igwn_alert/bootsteps.py", line 118, in stop
self._client.stream_obj._consumer.stop()
AttributeError: 'IGWNAlertClient' object has no attribute 'stream_obj'