Skip to content

Substantial speedup to loading VT from hdf5 file

I've noticed that the method load_injection_data can be very slow, taking about 15-20 minutes to load the injections from the endo3 injections (/home/reed.essick/rates+pop/o3-sensitivity-estimates/LIGO-T2100113-v11/endo3_bbhpop-LIGO-T2100113-v11.hdf5 on CIT). It appears this is due to the datasets being loaded from the hdf5 file and immediately sliced with the found array and then explicitly converted to a numpy or cupy array. It turns out when slicing an hdf5 dataset, using an explicit list of indices can be really slow. I think first slicing the dataset with [()] to return the array and then slicing it with found array is more efficient.

Testing these changes on loading the injection set above, loading the data now takes about 340 ms. The current function takes about 18 minutes to load the same data, so this change offers a >3,000x speedup. I have also confirmed that the dictionaries returned by the function using the current method and this new function are identical.

Merge request reports