"list assignment index out of range" in python
Done statement: List assignment is performed safely and any encountered errors are reported for future debugging.
We had a large number of nodes crash on MDC around the same time, with some variant of the following error:
Py3:
10.9.1.149 - - [27/Jan/2023 06:43:28] "GET /latency_history.txt HTTP/1.1" 200 49002
IndexError: list assignment index out of range
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/spiir/lib/python3.8/site-packages/gstlal/pipeparts/sink.py", line 425, in new_sample_handler
return self.pull_buffers(elem)
File "/usr/spiir/lib/python3.8/site-packages/gstlal/pipeparts/sink.py", line 464, in pull_buffers
self.appsink_new_buffer(elem_with_oldest)
File "/usr/spiir/lib/python3.8/site-packages/gstlal_spiir/pipemodules/postcoh_finalsink.py", line 642, in appsink_new_buffer
newevents = postcohtable.from_buffer(mapinfo.data.tobytes())
SystemError: <built-in function from_buffer> returned a result with an error set
Py2:
Traceback (most recent call last):
50500 File "/usr/spiir/lib/python2.7/site-packages/gstlal/pipeparts/__init__.py", line 813, in appsink_handler
50501 self.appsink_new_buffer(elem_with_oldest)
50502 File "/usr/spiir/lib/python2.7/site-packages/gstlal/pipemodules/postcoh_finalsink.py", line 606, in appsink_new_buffer
50503 newevents[0].postcoh_inspiral.ifos)
50504 File "/usr/spiir/lib/python2.7/site-packages/gstlal/pipemodules/postcohtable/postcohtable.py", line 71, in __getattribute__
50505 for ifo_id, ifo in enumerate(pipe_macro.IFO_MAP):
50506 IndexError: list assignment index out of range
Although the py2 version gives a specific line, it has nothing to do with list assignment (though a following line does), and it's an extremely frequently used line. So our best guess is that there's something asynchronous going on. Postcohtable is producing an error, as in py3, and py2 is reporting the wrong stack trace.
The only place we do list assignment in _postcohtable.c
is on the list of postcohtriggers, and each trigger's list of snr_series. snr_series is a likely culprit, since it sets its length based on the ifos
string, but sets each element based on which snr series are not null.
Even if we can't track down the exact error, we should make sure all list assignment is safe, and if something unexpected happens, that we record a meaningful error.