Intermittent error with `asimov monitor`: Failed to connect to schedd
I'm seeing an intermittent error when running asimov monitor
. Here's the full traceback:
(asimov-tutorial) [michael.williams@ldas-pcdev11 nessai_test_v2]$ asimov monitor
GW150914_095045
- Prod1[bilby]
running
● Prod1 is stuck; attempting a rescue
- Prod2[bilby]
Timeout when waiting for remote host
running
● Prod2 is stuck; attempting a rescue
- Prod2[bilby]
● ready
- Prod2[bilby]
● ready
- Prod3[bilby]
running
Traceback (most recent call last):
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/asimov/cli/monitor.py", line 52, in monitor
job = condor.CondorJob(production.meta['job id'])
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/asimov/condor.py", line 24, in __init__
self.get_data()
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/asimov/condor.py", line 83, in get_data
raise ValueError
ValueError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/michael.williams/.conda/envs/asimov-tutorial/bin/asimov", line 8, in <module>
sys.exit(olivaw())
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/asimov/cli/monitor.py", line 100, in monitor
pipe.after_completion()
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/asimov/pipelines/bilby.py", line 283, in after_completion
cluster = self.run_pesummary()
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/asimov/pipeline.py", line 203, in run_pesummary
with schedd.transaction() as txn:
File "/home/michael.williams/.conda/envs/asimov-tutorial/lib/python3.9/site-packages/htcondor/_lock.py", line 70, in wrapper
rv = func(*args, **kwargs)
htcondor.HTCondorIOError: Failed to connect to schedd.