Fix the setting of the Bayesian Blocks minimum chunks length (!114) · Merge requests · cwinpy / CW Inference in Python

Matthew Pitkin requested to merge fix_bbminlength into filter_change Jul 06, 2022

Currently, during the heterodying the data for each analysis time chunk gets stored as a HeterodynedData object. In this stage the bbminlength keyword variable to HeterodynedData is set to the length of the data chunk (e.g., on order the number of samples in a day for the default pipeline parameters), to prevent the Bayesian Blocks algorithm being run and thus slightly speeding up the output. This means that each heterodyned data file contains this bbminlength value. When multiple data files get merged, the final merged file will contain the bbminlength value from the first file, which will be the number of samples in that first file.

This becomes problematic when later reading in that heterodyned data from a hdf5 file containing the HeterodynedData object, for example, for parameter estimation. When such a file is read in, rather than using the user specified or default bbminlength value (5 samples), it uses the internal stored one. This is not what's wanted and means that the Bayesian Block algorithm can generate chunks that are far longer than is actually valid, i.e., chunks where the data statistics can change considerably. One case where this becomes obvious, is the analysis of the O1 hardware injection, PULSAR13, in the L1 detector. In this case the posterior when using the CWInPy heterodyned data is considerably broader than that from the lalapps pipeline. If you look at the number of chunks that Bayesian Blocks analysis has been split into is it just 35 (with the shortest containing 1419 samples), compared to 498 chunks for the lalapps output. Because the data in this case is highly non-stationary, the smaller number of chunks actually means that there is considerable non-stationarity within chunks, which leads to a broader posterior (it infers that the noise is larger) - see here.

This MR corrects this in two ways:

it makes sure the bbminlength attribute of the HeterodynedData objects used for outputting the heterodyned data is reset to None prior to writing it to file;
it changes the bayesian_blocks method of the HeterodynedData object to default to using the argument values for bbminlength (and bbmaxlength and threshold) rather than the internally stored attributes.

With this fix, the number of chunks that the Bayesian Blocks algorithm splits CWInPy heterodyned data into is 511. When comparing the PE outputs, rather having a broader posterior (as shown here) the posteriors between CWInPy and the lalapps code now are in agreement:

Admin message

Fix the setting of the Bayesian Blocks minimum chunks length

Merge request reports