Substantial speedup to loading VT from hdf5 file
I've noticed that the method load_injection_data
can be very slow, taking about 15-20 minutes to load the injections from the endo3 injections (/home/reed.essick/rates+pop/o3-sensitivity-estimates/LIGO-T2100113-v11/endo3_bbhpop-LIGO-T2100113-v11.hdf5
on CIT). It appears this is due to the datasets being loaded from the hdf5 file and immediately sliced with the found
array and then explicitly converted to a numpy or cupy array. It turns out when slicing an hdf5 dataset, using an explicit list of indices can be really slow. I think first slicing the dataset with [()]
to return the array and then slicing it with found
array is more efficient.
Testing these changes on loading the injection set above, loading the data now takes about 340 ms. The current function takes about 18 minutes to load the same data, so this change offers a >3,000x speedup. I have also confirmed that the dictionaries returned by the function using the current method and this new function are identical.