Skip to content

Main worker gets stuck in infinite loop when PE fails to find data

GWCelery is getting stuck in an infinite loop when it tries to perform PE for events who's online data can't be found. The loop consists of trying and failing to query data for PE jobs, and generating the SNRSummary.png and nEvtSummary.png files. For example, G783810 has 21572 copies of SNRSummary.png ande 21569 copies of nEvtSummary.png.

This means that we can't recover when GWCelery goes down long enough for the low latency data to not be available, which makes fixing this extremely high priority.

The behavior that I saw in our monitoring tools is that gwcelery.tasks.inference.query_data keeps failing with errors like this:

Traceback (most recent call last):
  File "/home/emfollow-playground/.local/lib/python3.9/site-packages/celery/app/autoretry.py", line 34, in run
    return task._orig_run(*args, **kwargs)
  File "/home/emfollow-playground/.local/lib/python3.9/site-packages/gwcelery/tasks/inference.py", line 80, in query_data
    raise NotEnoughData
gwcelery.tasks.inference.NotEnoughData

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/emfollow-playground/.local/lib/python3.9/site-packages/celery/app/trace.py", line 451, in trace_task
    R = retval = fun(*args, **kwargs)
  File "/home/emfollow-playground/.local/lib/python3.9/site-packages/sentry_sdk/integrations/celery.py", line 207, in _inner
    reraise(*exc_info)
  File "/home/emfollow-playground/.local/lib/python3.9/site-packages/sentry_sdk/_compat.py", line 57, in reraise
    raise value
  File "/home/emfollow-playground/.local/lib/python3.9/site-packages/sentry_sdk/integrations/celery.py", line 202, in _inner
    return f(*args, **kwargs)
  File "/home/emfollow-playground/.local/lib/python3.9/site-packages/celery/app/trace.py", line 734, in __protected_call__
    return self.run(*args, **kwargs)
  File "/home/emfollow-playground/.local/lib/python3.9/site-packages/celery/app/autoretry.py", line 54, in run
    ret = task.retry(exc=exc, **retry_kwargs)
  File "/home/emfollow-playground/.local/lib/python3.9/site-packages/celery/app/task.py", line 738, in retry
    raise ret
celery.exceptions.Retry: Retry in 447s: NotEnoughData()

And we keep uploading the files I pointed out above.

Edited by Cody Messick
To upload designs, you'll need to enable LFS and have an admin enable hashed storage. More information