The relative binning (heterodyned) likelihood (Zackay et al.) offers a method to accelerate the likelihood for arbitrary frequency-domain models. While it is more widely applicable than ROQ bases, more care must be taken with tuning.
For this review, we have focused on demonstrating two things:
- when a good fiducial point is provided, the results obtained with this approximation are high fidelity.
- when a bad fiducial point is used, or the approximation otherwise fails, we can identify this in a programmatic way.
To establish this, we importance sample the results obtained with relative binning using the regular likelihood. If the mismatches, defined as the log of the absolute difference between (natural) log-likelihoods obtained with the two methods are small, the approximation is good. By rejection sampling using the weights (true vs approximate likelihood ratios), we can find the fraction of samples obtained. If the rejection sampling efficiency is small, then we can say that the approximation failed and we should repeat with a more robust method.
We do not attempt to validate the performance of the automatic fiducial point finding using likelihood optimization. We also do not test the method on any waveform models with higher-order modes where the approximation is expected to be less robust.
The importance sampling can be automatically performed by bilby_pipe
using the reweighting-configuration
argument.
Unit testing
As part of the Bilby
CI unit testing, we verify that the binned likelihood agrees with the regular likelihood as the reference point for a range of cases.
We also verify that the likelihood is close to the regular model for 100 points drawn from a prior distribution.
The final test is the optimization to find the reference parameters gives a good likelihood match after the optimization.
Large-scale performance
The relative binning likelihood is used as part of the sampler review as we don't have a viable ROQ basis for IMRPhenomNSBH
.
This allows testing the approximation over a broad range of potential NSBH systems.
For these analyses, we choose the fiducial parameters to match the injected values.
We extract the following general trends:
- reweighting to the regular likelihood correctly identifies failures of the method.
- the approximation is very good for systems with SNR > 10 when initialized near the peak of the likelihood and rapidly degrades at higher SNRs.
Efficiency against SNR
The two plots below indicate the SNR dependence of the reweighting efficiency (equivalent to approximation fidelity.)
Likelihood mismatches
Below are the absolute likelihood mismatches for all of the pp-test analyses. the blue histograms correspond to the points above with efficiencies < 0.9 and the yellow have efficiencies > 0.9. We note that while for many of the blue traces, the differences are comparatively large, the important quantity is the width of each distribution and the cases that fail often have large tails in the mismatches.
For some successful cases, the mean ln likelihood is > 0.1 in the tails, however, this can easily be corrected using likelihood reweighting.
BNS analyses
The fiducial BNS injection has been analyzed with the relative binning likelihood.
In all cases where a suitable starting point was provided, we see good agreement with the ROQ-likelihood runs and good resampling efficiency.
Here is the distribution of likelihood mismatches for two identical analyses of the fiducial BNS signal with a processing spin prior with magnitudes up to 0.4 and tidal deformability up to 5000. The legend entries show the fraction of samples surviving rejection sampling. It is very close to 1.
By accident, we performed some runs with fiducial parameters that are a very bad fit to the actual signal (the mass and spin for one of the NSBH simulations was used instead of the neutron star masses).
In this case, we found that the rejection sampling efficiency was very small with large mismatches.
The corresponding analysis can be found at /home/sylvia.biscoveanu/bilby_pipe/runs/review_test/O4/fiducial_bns_PhenomTidal_take2/outdir_dynesty_relbin_medSpin_precessing_cal
.
While we did not perform large-scale testing in this regime. These findings are consistent with the tests with the NSBH waveform model that we can get good results in a representative use case and identify bad results.