Performance issues with lalinference Python package (nrutils.py)
Hello,
I have been investigating on why the PESummary package is slow for loading the HDF5 results file and it seems that some of it comes from lalinference package, more specifically from the imrtgr/nrutils.py
file. I have traced it to two issues:
- There are many calls within the file like
m1 = np.vectorize(float)(np.array(m1))
(e.g. line 111); it seems that this is pretty old code, and could be replaced by the more idiomatic (and more efficient)m1 = np.array(m1,dtype=np.float64)
- There are two instances of this weird construct of a function vectorizing itself (at line 273 and line 1062):
# Vectorize the function if arrays are provided as input
if np.size(m1) * np.size(m2) * np.size(chi1) * np.size(chi2) > 1:
return np.vectorize(bbh_final_spin_non_precessing_Healyetal)(m1, m2, chi1, chi2, version)
I am really unsure what this is supposed to do (apply the function individually to each combination of elements of each arrays probably ?), but the function seems to work as expected when removing it, as the rest of the function already works for arrays. Removing it causes a very significant gain, as most of the time spent in the function is actually the numpy vectorizing machinery ; this would reduce the execution time by about 2 or 3 orders of magnitude.
I should point out that I am not very familiar with the package, so it is very possible that I am missing something and that there is a good reason for this (at least in some cases). Also tagging @nathan-johnson-mcdaniel who appears to have introduced the code for the second part according to the git blame.
I can open a merge request if you agree with this.