This repository deals with the result review of the machine learning-based pipeline, [**lensid**](https://git.ligo.org/srashti.goyal/lensid), for the LVK O3b lensing analysis, to identify the potential strongly lensed candidate BBH event pairs using their Skymaps and Q-transforms.
The automation wrapper : [alensidforlensingflow](https://git.ligo.org/srashti.goyal/alensidforlensingflow)
Script for downloading events info from CBCFlow (path loaded on CIT), skymaps(.fits) from GraceDB and strain data from ligo servers using GWpy. Note: needs valid ligo key path for accessing non-public data in Gracedb., eg: `/tmp/x509up\\\\\\\*` Raw data on CIT is in `/home/srashti.goyal/lensid_O4/data_download_preparation/O4a_events_data`
</td>
<td>OK</td>
<td>
</td>
<td>include a link to the instructions for generating the LIGO key path in comments.</td>
Prepare cartesian skymaps and qtransforms for O4a the real events. Also filters events compatible with lensid (BBHs) and prepares a dataframe. Note: The prepared inputs are saved on CIT in `/home/srashti.goyal/lensid_O4/data_download_preparation/O4a_lensid_inputs`
| [pop_datasets.ipynb](https://git.ligo.org/srashti.goyal/lensid-ml-o4/-/blob/master/retraining_for_O4/pop_datasets.ipynb) | Notebook having plots of injection parameters for training and testing. | | | \-------------- |
| [Config](https://git.ligo.org/srashti.goyal/lensid-ml-o4/-/blob/master/config_o4a.yaml?ref_type=heads) | ML models and config for production runs | Needs to be finalised | | \-------------- |
|[make_predictions.ipynb](https://git.ligo.org/srashti.goyal/lensid-ml-o4/-/blob/master/make_predictions.ipynb?ref_type=heads) | ML models benchmark performance and background estimations along with MDC results for the final model. | In development | | \-------------- |
| [investigations_visualisations.ipynb](https://git.ligo.org/srashti.goyal/lensid-ml-o4/-/blob/master/investigations_visualisations.ipynb?ref_type=heads) | Investigating/Eyeballing pairs and compare the performances with the other pipelines | | | \-------------- |
</details>
# Review Calls
The review call happen on Wednesdays 1 PM CEST/ 4:30 PM IST virtual IFPA room: https://shorturl.at/uxLS5
## 13 July 2023
* We discussed the plan and action items for finalising the machine for production run.
- PSD for training and testing.
- Population: astro (haris et al 1400+1400) , astro (O4) v/s uniform (8000+8000) v/s ensembling
- training set size, Overfitting and optimization, variance.
- Subthreshold: ROCs, single det \> 4
- Detectors H L and H L V ?
- Generate Background (later)
- Noise: Gaussian v/s real (O4b)
* We discussed the Data Preparation steps for O4a real events
- Event list (CBCFlow)
- Downloading to frame files, skymaps
- Config for running ML
- Fixing the Passing criteria (top 1 %)
### Action items:
* [x] O4 pop + uniform train and test and vary the training set size
## 1 August 2023
* We discussed the three populations used for training and testing ML QTs: uniform, old astrophysical haris et al, O4 astrophysical pop.
* The uniform pop-trained machine or a combination of O4 astro + uniform pop-trained machine, as they seem more robust towards population model assumptions.
* We also discussed the possibility of methods paper and having real noise MDC for the ongoing efforts.
* We also discussed the data preparation for O4, using the CBCflow and which skymaps to use.
### Action items
* [x] Write data download and preparation scripts.
## 17 August 2023
* We discussed the data download and preparation scripts line by line.
* We also discussed the status of Virgo.
* We are inclining towards using uniform pop-trained machine for the production
* We also discussed lensingflow integration and the possibility of Asimov.
### Action items
* [x] Gather and organise the results.
* [x] Generate 2 detector skymaps + QTs.
* [x] Plot different (projected, design, O4a) PSDs and compare
## 22 September 2023
* We discussed the 2 v/s 3 detector events ROCs using skymaps
* We saw the PSDs, it seems O4a real and projected noise is compatible except for the lower frequency (\<70 Hz). The two LIGOs are working close to the design sensitivity as well.
* We went through the first set of preliminary results of lensid for O4a events.
* We also discussed lensingflow integration and the possibility of Asimov.
### Action items
* [x] Make lensid compatible with a single pair, with the config file.
* [x] FAP computation implemented
## 10 October 2023
* Went through the new script which is command line and runs with config for making predictions.
* We the FAP computation and the threshold to put.
* We saw a couple of pairs that seem significant with LensID preliminary results. Some of them have bad skymap. needs fixing.
* We also discussed about the real noise injections, think it might be good to start working towards it.
### Action items
* [x] Start working on automation and CBCFlow integration.
## 8 November 2023
* We discussed the integration of lensid with the lensing flow and visited the new package : https://git.ligo.org/srashti.goyal/alensidforlensingflow
* JR suggested to use O4a real noise PSDs for the training and testing of the final ML model for the production runs. (detchar)\[https://ldas-jobs.ligo-wa.caltech.edu/~detchar/summary]
* We are still unsure about the inclusion of the population and the time delay lensing priors for the follow-up strategy.
## Action items
* [x] Update the result review page with the new scripts.
* [x] Note the changes in the reviewed code.
## 26 January 2024
* We discussed the doubts of JR for PSDs and using threads.
* We think documenting the population stuff and 2 v/s 3 det stuff would be useful for the future.
## 12 February 2024
* We discussed the first preliminary results for O4a: 94/3486 pairs have FPP < 0.01 threshold.
* Visual investigations and how the events are being passed on need to be fixed. Especially for m2<5Msun events which LensID doesn't consider.
* We discussed the configuration file and things that need to be finally reviewed.
* We also discussed the subthreshold events workflow and how to go about it.
### Action items
* [x] Fix versions for lensingflow integration.
* [x] Write scripts for investigating the pairs, along with calculating rapid stats.
* [x] Check 3 events with exceptionally low FPPs. Seems like a bug.
* [x] Check event with the missing skymap.
* [x] Compare results with other pipelines.
We also want to prepare a document to quantify all the things. One main issue is the bias-variance trade-off. https://www.bmc.com/blogs/bias-variance-machine-learning/
## 19 February 2024
* We did the investigations of preliminary results.
* S230630bq and S231226av has some problem with skymaps.
* We compared the results with BLU and bhattacharya.
### Action items
* [x] S230630bq should be updated and S231226av needs checking with the training dataset etc., seems to have good SNR and localised skymap.
## 26 February 2024
* We discussed the preliminary results once again.
* We think that Bhattacharya distance <3 should be used as an additional quick cut for passing on the events in the flow.
* The `S230606d S231226av` event needs extrapolation and its skymap is inconsistent between PE and Gracedb
* We talked about O4b and integration choice with the new ML pipeline SLICK.
### Action items
* [x] Implement extrapolation while calculating FPP, for events in the edge.
## 13 May
* Discussed the outstanding action items.
* We also discussed if there is a need for an extra reviewer or analyst. As things are close to completion we don't think it's required.
* We went through the script for calculating FAPs and for low FAPs it needs some modifications.
* We discussed why SLICK might be doing better for QTs. One possibility is the training set size another is the way of training i.e. they freeze the initial 10 layers. We need to talk about the integration for O4a/O4b.
* We also discussed the population model while training. It is still a tough choice but we may want to use astro distribution as it seems to do well on Haris et al as well.
* [x] Fix extrapolation or low FAP values that are going to 0.
* [x] Implement the BD <3 cut in the final is_lensing_favoured output.
## 27 May 2024
* We discussed the preliminary results.
* We noticed that BD < 3 cut isn't that good given that PE and match-filter chirp mass can be very different.
* Saurabh is now on board with the results.
### Action items
* [x] Investigate the events with zero mass posterior overlaps.
## 14 June 2024
* We eyeballed events with zero mass overlaps but selected by lensID.
* Saurabh reviewed some of the scripts and discussed conceptual things regarding bhattacharya distance.
* We also discussed the findings of Adrien and also SAurabh that GWPy QTs are better than PYCBC ones and that seems to be one of reasons for improvement for SLICK.
### Action items
* [ ] Prepare the scripts for result review.
* [ ] Train a final ML and background while optimising.
# Introduction
This repository deals with the result review of the machine learning-based pipeline, [**lensid**](https://git.ligo.org/srashti.goyal/lensid), for the LVK O3b lensing analysis, to identify the potential strongly lensed candidate BBH event pairs using their Skymaps and Q-transforms.
The automation wrapper : [alensidforlensingflow](https://git.ligo.org/srashti.goyal/alensidforlensingflow)
Script for downloading events info from CBCFlow (path loaded on CIT), skymaps(.fits) from GraceDB and strain data from ligo servers using GWpy. Note: needs valid ligo key path for accessing non-public data in Gracedb., eg: `/tmp/x509up\\\\\\\*` Raw data on CIT is in `/home/srashti.goyal/lensid_O4/data_download_preparation/O4a_events_data`
</td>
<td>OK</td>
<td>
</td>
<td>include a link to the instructions for generating the LIGO key path in comments.</td>
Prepare cartesian skymaps and qtransforms for O4a the real events. Also filters events compatible with lensid (BBHs) and prepares a dataframe. Note: The prepared inputs are saved on CIT in `/home/srashti.goyal/lensid_O4/data_download_preparation/O4a_lensid_inputs`
| [pop_datasets.ipynb](https://git.ligo.org/srashti.goyal/lensid-ml-o4/-/blob/master/retraining_for_O4/pop_datasets.ipynb) | Notebook having plots of injection parameters for training and testing. | | | \-------------- |
| ML QTs [L1](https://ldas-jobs.ligo.caltech.edu/~srashti.goyal/O4a_training/L1/uniform_config_lr_0.01_ep_15_bs_500/), [H1](https://ldas-jobs.ligo.caltech.edu/~srashti.goyal/O4a_training/uniform_config_lr_0.01_ep_15_bs_500/) | | | \-------------- |
| [Config](https://git.ligo.org/srashti.goyal/lensid-ml-o4/-/blob/master/config_o4a.yaml?ref_type=heads) | ML models and config for production runs | Needs to be finalised | | \-------------- |
|[make_predictions.ipynb](https://git.ligo.org/srashti.goyal/lensid-ml-o4/-/blob/master/make_predictions.ipynb?ref_type=heads) | ML models benchmark performance and background estimations along with MDC results for the final model. | In development | | \-------------- |
| [investigations_visualisations.ipynb](https://git.ligo.org/srashti.goyal/lensid-ml-o4/-/blob/master/investigations_visualisations.ipynb?ref_type=heads) | Investigating/Eyeballing pairs and compare the performances with the other pipelines | | | \-------------- |
# Review Calls
The review call happen on Wednesdays 1 PM CEST/ 4:30 PM IST virtual IFPA room: https://shorturl.at/uxLS5
## 13 July 2023
* We discussed the plan and action items for finalising the machine for production run.
- PSD for training and testing.
- Population: astro (haris et al 1400+1400) , astro (O4) v/s uniform (8000+8000) v/s ensembling
- training set size, Overfitting and optimization, variance.
- Subthreshold: ROCs, single det \> 4
- Detectors H L and H L V ?
- Generate Background (later)
- Noise: Gaussian v/s real (O4b)
* We discussed the Data Preparation steps for O4a real events
- Event list (CBCFlow)
- Downloading to frame files, skymaps
- Config for running ML
- Fixing the Passing criteria (top 1 %)
### Action items:
* [x] O4 pop + uniform train and test and vary the training set size
## 1 August 2023
* We discussed the three populations used for training and testing ML QTs: uniform, old astrophysical haris et al, O4 astrophysical pop.
* The uniform pop-trained machine or a combination of O4 astro + uniform pop-trained machine, as they seem more robust towards population model assumptions.
* We also discussed the possibility of methods paper and having real noise MDC for the ongoing efforts.
* We also discussed the data preparation for O4, using the CBCflow and which skymaps to use.
### Action items
* [x] Write data download and preparation scripts.
## 17 August 2023
* We discussed the data download and preparation scripts line by line.
* We also discussed the status of Virgo.
* We are inclining towards using uniform pop-trained machine for the production
* We also discussed lensingflow integration and the possibility of Asimov.
### Action items
* [x] Gather and organise the results.
* [x] Generate 2 detector skymaps + QTs.
* [x] Plot different (projected, design, O4a) PSDs and compare
## 22 September 2023
* We discussed the 2 v/s 3 detector events ROCs using skymaps
* We saw the PSDs, it seems O4a real and projected noise is compatible except for the lower frequency (\<70 Hz). The two LIGOs are working close to the design sensitivity as well.
* We went through the first set of preliminary results of lensid for O4a events.
* We also discussed lensingflow integration and the possibility of Asimov.
### Action items
* [x] Make lensid compatible with a single pair, with the config file.
* [x] FAP computation implemented
## 10 October 2023
* Went through the new script which is command line and runs with config for making predictions.
* We the FAP computation and the threshold to put.
* We saw a couple of pairs that seem significant with LensID preliminary results. Some of them have bad skymap. needs fixing.
* We also discussed about the real noise injections, think it might be good to start working towards it.
### Action items
* [x] Start working on automation and CBCFlow integration.
## 8 November 2023
* We discussed the integration of lensid with the lensing flow and visited the new package : https://git.ligo.org/srashti.goyal/alensidforlensingflow
* JR suggested to use O4a real noise PSDs for the training and testing of the final ML model for the production runs. (detchar)\[https://ldas-jobs.ligo-wa.caltech.edu/~detchar/summary]
* We are still unsure about the inclusion of the population and the time delay lensing priors for the follow-up strategy.
## Action items
* [x] Update the result review page with the new scripts.
* [x] Note the changes in the reviewed code.
## 26 January 2024
* We discussed the doubts of JR for PSDs and using threads.
* We think documenting the population stuff and 2 v/s 3 det stuff would be useful for the future.
## 12 February 2024
* We discussed the first preliminary results for O4a: 94/3486 pairs have FPP < 0.01 threshold.
* Visual investigations and how the events are being passed on need to be fixed. Especially for m2<5Msun events which LensID doesn't consider.
* We discussed the configuration file and things that need to be finally reviewed.
* We also discussed the subthreshold events workflow and how to go about it.
### Action items
* [x] Fix versions for lensingflow integration.
* [x] Write scripts for investigating the pairs, along with calculating rapid stats.
* [x] Check 3 events with exceptionally low FPPs. Seems like a bug.
* [x] Check event with the missing skymap.
* [x] Compare results with other pipelines.
We also want to prepare a document to quantify all the things. One main issue is the bias-variance trade-off. https://www.bmc.com/blogs/bias-variance-machine-learning/
## 19 February 2024
* We did the investigations of preliminary results.
* S230630bq and S231226av has some problem with skymaps.
* We compared the results with BLU and bhattacharya.
### Action items
* [x] S230630bq should be updated and S231226av needs checking with the training dataset etc., seems to have good SNR and localised skymap.
## 26 February 2024
* We discussed the preliminary results once again.
* We think that Bhattacharya distance <3 should be used as an additional quick cut for passing on the events in the flow.
* The `S230606d S231226av` event needs extrapolation and its skymap is inconsistent between PE and Gracedb
* We talked about O4b and integration choice with the new ML pipeline SLICK.
### Action items
* [x] Implement extrapolation while calculating FPP, for events in the edge.
## 13 May
* Discussed the outstanding action items.
* We also discussed if there is a need for an extra reviewer or analyst. As things are close to completion we don't think it's required.
* We went through the script for calculating FAPs and for low FAPs it needs some modifications.
* We discussed why SLICK might be doing better for QTs. One possibility is the training set size another is the way of training i.e. they freeze the initial 10 layers. We need to talk about the integration for O4a/O4b.
* We also discussed the population model while training. It is still a tough choice but we may want to use astro distribution as it seems to do well on Haris et al as well.
* [x] Fix extrapolation or low FAP values that are going to 0.
* [x] Implement the BD <3 cut in the final is_lensing_favoured output.
## 27 May 2024
* We discussed the preliminary results.
* We noticed that BD < 3 cut isn't that good given that PE and match-filter chirp mass can be very different.
* Saurabh is now on board with the results.
### Action items
* [x] Investigate the events with zero mass posterior overlaps.
## 14 June 2024
* We eyeballed events with zero mass overlaps but selected by lensID.
* Saurabh reviewed some of the scripts and discussed conceptual things regarding bhattacharya distance.
* We also discussed the findings of Adrien and also SAurabh that GWPy QTs are better than PYCBC ones and that seems to be one of reasons for improvement for SLICK.
### Action items
* [ ] Prepare the scripts for result review.
* [ ] Train a final ML and background while optimising.