Leo Tsukada requested to merge master-chisq-signalmodel-test into master Nov 03, 2022

Overview

This MR introduces a new SNR-chisq signal model derived from templates' auto-correlation function.
While Andre is still working on his data-driven approach, we have implemented the preliminary version.
The whole development consists of the following features:

svd_bank.py : compute lambda-eta related quantities for each template and store them in Gamma2-4 columns
gstlal_inspiral_create_prior_diststat : take these four quantities from sngl_inspiral_table in a given svd bank file and pass it to new add_signal_model_analytic to construct ifo-dependent signal model (note that the auto-correlation depends on ifo through its PSD, so the resultant signal model technically varies over ifos)
inspiral_lr.py : accommodate the ifo-dependent signal model. also, since the new signal model is supposed to be more accurate without KDE, the KDE smoothing is disabled in finish() only for the signaldensity class
inspiral_extrinsics.py : add add_singl_model_analytic() in NumeratorSNRCHIPDF class. That constructs a new signal model for given lamda-eta related quantities and marginalize over given mismatch range.

things to discuss

Since this new signal model is not supposed to be KDE-ed, I disabled it when calling finish() for the numerator pdf (see here). As expected, this created some region in SNR-chisq space where lnP cannot be evaluated, e.g. whitened region in the histogram plot paster below. This caused the program to crash when sampling such a region as it would have returned NaN value. So I avoided it by adding fast-cut here so that rankingstat.numerator(**kwargs) returns NegInf for samples falling onto the region. Since those samples don't really contribute to our background model anyways, this fast-cut shouldn't make much impact on FAR estimates, etc... But I would like to make sure that this is a right thing to do, or if there is any smarter way to do this, e.g. adding negligible yet non-zero prior onto the entire param space.
Or maybe should I just have expanded this limited region such that it will relax the boundaries in chisq and SNR=1e10?

Test results

I tested this new signal model with a rerank dag using Cort's offline dag with the manifold bank + mu sorting.
Cort's offline open box results New offline open box results

VT-FAR plot

Cort's original run
New run
ratio

VT-total SNR plot

Cort's original run

new run

ratio

The VT - FAR plot suggests that there is actually consistent improvement for BNS and lighter BBHs by 10 - 20 %. and the VT - SNR seem to imply that that improvement mostly come from SNR ~ 8-ish.
In these results, the ifo-dependent horizon factor was also implemented.

injection recovery table

The injection recovery suggests that the one with the new signal model found more injections for most of the categories compared to the old results.
old

new

SNR-`\xi^2` histogram plots (H1 + no KDE)

BNS bin

old

new

NSBH/BBH bin

old

new

BBH bin

old

new

IMBH bin

old

new

In general, the new signal model show larger width at SNR~10 but narrower width at SNR~100 throughout these bins.
Also note that the width dependes on mismatch range one gives to the gstlal_inspiral_create_prior_diststat program. For the above plots, the mismatch range of 0.1 - 30% was given.

analytic VT comparison

As suggested by Kipp, Here is the comparison between the analytic and measured VTs. The one with the new signal model seems to improve the accuracy of the analytic VT compared to that from Cort's run for BNS and lightest BBH categories, which is consitent with the fact that the new signal model improved VT for these two categories. Note that the both analytic and measured VTs vary between the two runs as the injection databse and diststat pdf are different.