|
|
[[_TOC_]]
|
|
|
|
|
|
# Intro
|
|
|
|
|
|
This page deals with the change in the pipeline after O3 run. For code review of the pipeline before that please refer to this [page](https://git.ligo.org/srashti.goyal/lensid/-/wikis/Code-Review).
|
|
|
|
|
|
Reviewer: Jean-Rene Cudell
|
|
|
|
|
|
[**Installation instructions**](https://git.ligo.org/srashti.goyal/strong-lensing-ml/-/wikis/Installation-instructions)
|
|
|
|
|
|
|
|
|
# Relevant slides
|
|
|
[Till O3 method, review, results](https://docs.google.com/presentation/d/10bIhtFae5RIJ3WBJg1Lcy7PueSKwxh1m2APRDN0w0PA/edit?usp=sharing)
|
|
|
|
|
|
[O4](https://docs.google.com/presentation/d/1Lwmb-D-rCLF3Dr4gHbU5T9Rv3g6Mf0v2u9FN4UUI1lk/edit?usp=sharing)
|
|
|
|
|
|
# Developments
|
|
|
|
|
|
- [x] Training with O4 gaussian noise PSD
|
|
|
- [x] Whitening
|
|
|
- [x] Change input method for Q-transforms: Superposition -> Superposition + individual
|
|
|
- [x] Data Generator for large
|
|
|
- [x] Single detector training with uniform in masses dataset, including subthreshold triggers.. single det snr : 4 to 40 (powerlaw)
|
|
|
- [x] O4 MDC here [git](https://git.ligo.org/srashti.goyal/lensing_mdc_o4)
|
|
|
- [x] O4 simulated dataset generation
|
|
|
- [ ] Optimisation and ensembling (to reduce overfitting and variance)
|
|
|
- [ ] Benchmarking results (+ find optimal way to combine HLV ML QTs results)
|
|
|
- [ ] Background computation
|
|
|
- [ ] CBCFlow integration
|
|
|
- [ ] O4 Data preparation
|
|
|
- [ ] O4 simulated noise trained and tested Machines
|
|
|
- [ ] O4 real noise training
|
|
|
|
|
|
## Package Scripts
|
|
|
|
|
|
### Data preparation
|
|
|
| Script | Short description | Status | old git hash | new git hash| Comment | Final Sign-off|
|
|
|
|--------|-------------------|--------|----------|---------|----------------|-----|
|
|
|
| [qt_utils.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/utils/qt_utils.py) | helper script for injecting gaussian noise given a psd and waveform. Also plots and saves Qtransforms. Added these functionalities: .npz , flow (lower frequency), qrange : wide (3,30) | OK | 32d0854b1a68cf21827e65ca1c36feb7ca53d0f5 | 1e813099b6c3d2824016d059f4230e398e099d0e|[diff](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/diff_package_03052023_reviewed_2022.diff#L825-916) |:heavy_check_mark: |
|
|
|
| [lensid_create_qts_lensed_injs.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/injections/lensid_create_qts_lensed_injs.py) | generates waveforms and q-transforms for simulated lensed events given a set of injection parameters, using analytical/O3a PSDs. Eg: `lensid_create_qts_lensed_injs -odir check -start 10 -n 3 -infile ~/lensid/data/injection_pars/haris-et-al/lensed_inj_data.npz -psd_mode 1 -qrange 2 -mode 2`. Added single detector option eg: `--single_det H1`, changed injection parameters names, waveform approximant, and default qrange. | OK |32d0854b1a68cf21827e65ca1c36feb7ca53d0f5 | a722d7fce5daba757375442251598ba220e2e1ec | [diff](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/diff_package_03052023_reviewed_2022.diff#L45-199) line 641: shouldn't the default be 'whitened'? Also, line 927, why is tensorflow commented out?| :heavy_check_mark: |
|
|
|
| [lensid_create_qts_unlensed_injs.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/injections/lensid_create_qts_unlensed_injs.py) | generates waveforms and q-transforms for simulated unlensed events given a set of injection parameters, using analytical/O3a PSDs. Eg: `lensid_create_qts_unlensed_injs -odir check -start 10 -n 3 -infile ~/lensid/data/injection_pars/haris-et-al/unlensed_inj_data.npz -psd_mode 1 -qrange 2 -mode 2` | OK | 32d0854b1a68cf21827e65ca1c36feb7ca53d0f5 | a722d7fce5daba757375442251598ba220e2e1ec | [diff](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/diff_package_03052023_reviewed_2022.diff#L201-343) same comment as on previous file| :heavy_check_mark: |
|
|
|
| [lensid_png_to_npz.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/utils/lensid_png_to_npz.py) | script for converting png transform images to .npz files for faster IO. eg: `lensid_png_to_npz --indir check --outdir check_npz -n 3` | OK | NA|1e813099b6c3d2824016d059f4230e398e099d0e | for dataloader. |:heavy_check_mark: |
|
|
|
| [lensid_create_lensed_df.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/injections/lensid_create_lensed_df.py) | generates dataframe containing tags for lensed simulated event pairs, with columns as img_0, img_1 and Lensing(=1). Eg: `lensid_create_lensed_df -odir check -outfile lensed.csv -start 10 -n 3 -infile ~/lensid/data/injection_pars/haris-et-al/lensed_inj_data.npz` | OK | 32d0854b1a68cf21827e65ca1c36feb7ca53d0f5 | a46b1d4a9755bae8438baaf053d2fb552a0808b9 | [diff](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/diff_package_03052023_reviewed_2022.diff#L1-12) | :heavy_check_mark: |
|
|
|
| [lensid_create_unlensed_df.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/injections/lensid_create_unlensed_df.py) | generates dataframe containing tags for pairs of unlensed simulated events, with columns as img_0, img_1 and Lensing(=0). Eg: `lensid_create_unlensed_df -odir check -outfile unlensed.csv -start 10 -n 3 -infile ~/lensid/data/injection_pars/haris-et-al/unlensed_inj_data.npz` | OK | 32d0854b1a68cf21827e65ca1c36feb7ca53d0f5 | a46b1d4a9755bae8438baaf053d2fb552a0808b9| [diff](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/diff_package_03052023_reviewed_2022.diff#L345-356)| :heavy_check_mark: |
|
|
|
| [lensid_create_lensed_inj_xmls.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/injections/lensid_create_lensed_inj_xmls.py) | helper script that outputs LAL inj.xml file for lensed simulated events given the injection parameters for bayestar. | OK | 32d0854b1a68cf21827e65ca1c36feb7ca53d0f5 | a46b1d4a9755bae8438baaf053d2fb552a0808b9 | [diff](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/diff_package_03052023_reviewed_2022.diff#L14-44) | :heavy_check_mark: |
|
|
|
| [lensid_create_unlensed_inj_xmls.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/injections/lensid_create_unlensed_inj_xmls.py) | helper script that outputs LAL inj.xml file for unlensed simulated events given the injection parameters for bayestar. minor changes in the parameter names. | OK | 32d0854b1a68cf21827e65ca1c36feb7ca53d0f5 | a46b1d4a9755bae8438baaf053d2fb552a0808b9 | [diff](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/diff_package_03052023_reviewed_2022.diff#L358-388) | :heavy_check_mark: |
|
|
|
| [lensid_create_bayestar_sky_lensed_injs.sh](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/scripts/lensid_create_bayestar_sky_lensed_injs.sh) | generates bayestar skymaps(.fits) for lensed simulated events, using analytical/O3a PSDs. Also converts them to cartesian format and save as .npz files. Eg: `lensid_create_bayestar_sky_lensed_injs.sh -o check -s 10 -n 3 -i ~/lensid/data/injection_pars/haris-et-al/lensed_inj_data.npz -p ~/lensid/data/PSDs/analytical_psd.xml` Note: if this does not work try running this before `export PATH=$HOME/.local/bin:$PATH` | OK | 493ea099f42fc50d2cc081754d5395f57fafae76 | ------- | -------------- |:heavy_check_mark:|
|
|
|
| [lensid_create_bayestar_sky_unlensed_injs.sh](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/scripts/lensid_create_bayestar_sky_unlensed_injs.sh) | generates bayestar skymaps(.fits) for unlensed simulated events, using analytical/O3a PSDs. Also converts them to cartesian format and save as .npz files. Eg: `lensid_create_bayestar_sky_unlensed_injs.sh -o check -s 10 -n 3 -i ~/lensid/data/injection_pars/haris-et-al/unlensed_inj_data.npz -p ~/lensid/data/PSDs/analytical_psd.xml` | OK | 493ea099f42fc50d2cc081754d5395f57fafae76 | ------- | -------------- | :heavy_check_mark: |
|
|
|
| [lensid_fits_to_cart.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/utils/lensid_fits_to_cart.py) | helper script for converting HealPix skymap format(.fits) to cartesian. | OK | ac95f97e0c7e8d584b68ed364f353a5ed4bbb12d | need sanity check for hp.cartview during results review | unchanged | :heavy_check_mark: |
|
|
|
| [lensid_sky_injs_cart.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/injections/lensid_sky_injs_cart.py) | helper script for managing IO of fits_to_cart.py script for injection study | OK | 493ea099f42fc50d2cc081754d5395f57fafae76 | OK | -unchaged | :heavy_check_mark: |
|
|
|
|
|
|
### Features extraction, Train/test/predict utilities
|
|
|
| Script | Short description | Status | old git hash | new git hash| Comment | Final Sign-off |
|
|
|
|--------|-------------------|--------|----------|---------|----------------|--|
|
|
|
| [lensid_get_features_qts_ml.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/feature_extraction/lensid_get_features_qts_ml.py) | Script for calculating densenets predictions for a single detector Qtransforms given the trained densenet. Eg: `lensid_get_features_qts_ml -infile /home/srashti.goyal/lensing_MDC_O4/data_prep/data/dataframes/pairs.csv -outfile check_lensid_qts.csv -data_dir /home/srashti.goyal/lensing_MDC_O4/data_prep/data/qts/ -det H1 -whitened 1 -dense_model /home/srashti.goyal/lensid/development/retraining_for_O4/out/uniform_lr_005/dense_H1.h5` modified to single detector as compared to three detectors earlier.|OK | f9b7075d0e6ca8db211a0c3e43299af1eb428410 | 50a0178206e238fd705585c2feba8300a07d7732 | [diff](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/review/diff_lensid_get_features_qts_ml_py.diff) | :heavy_check_mark:|
|
|
|
| [lensid_get_features_sky_ml.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/feature_extraction/lensid_get_features_sky_ml.py) | Script for calculating features from the bayestar skymaps which go as input to "XGBoost with Skymaps" model. Eg: `lensid_get_features_sky_ml -infile check/lensed.csv -outfile check/lensed_sky.csv -data_dir check` | -OK-DC , OK-jrc | f9b7075d0e6ca8db211a0c3e43299af1eb428410 | NA | NA |:heavy_check_mark: |
|
|
|
| [ml_utils.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/utils/ml_utils.py) | utility script containing all machine learning model functions for training, FAP computation, predictions etc. Added data loader, Qtransforms input options file type | ? | 493ea099f42fc50d2cc081754d5395f57fafae76 |50a0178206e238fd705585c2feba8300a07d7732 | [diff] (https://git.ligo.org/srashti.goyal/lensid/-/blob/master/review/diff_ml_utils_py.diff) questions on line 297 and following; question on line 91 and following about the activation used| |
|
|
|
|
|
|
<details><summary> yet to change </summary>
|
|
|
|
|
|
### ML models: Training, Cross-validation, Optimisation, Testing, Comparison with BLU , Predictions
|
|
|
| Scripts | Short description | Status | git hash | Comment | final sign-off |
|
|
|
|---------|-------------------|--------|----------|---------|----------------|
|
|
|
| [train_densenets_qts.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/train_test/train_densenets_qts.py) | Train densenet with qtransform for a given detector. Eg: `python train_densenets_qts.py -lensed_df ~/strong-lensing-ml/data/dataframes/train/lensed.csv -unlensed_df ~/strong-lensing-ml/data/dataframes/train/unlensed_half.csv -odir dense_out/cit/ -epochs 10 -data_dir ~/alice_data_lensid/qts/train/`. Note: requires `tensorflow-gpu` to load CUDA libraries. | OK-DC; OK-jrc | a60740bb5a0cccb2be8e8184f16c0c7c93f8150b | | ---------------- |
|
|
|
| [train_crossvalidate_test_XGB_qts.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/train_test/train_crossvalidate_test_XGB_qts.py) | Trains, cross-validate and compare to BLU "XGBoost with QTs" model. Requires dataframe that already has the input features calculated from the Qtransform images and trained DenseNets. `python train_crossvalidate_test_XGB_qts.py -help` | OK-DC; OK-jrc\\ | jrc: the values of the parameters of XGBoost could be documented. | | |
|
|
|
| [train_crossvalidate_test_XGB_sky.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/train_test/train_crossvalidate_test_XGB_sky.py) | Train, cross-validates and compare to BLU "XGBoost with Skymaps" model. Requires dataframe that already has the input features calculated from the Bayestar/PE skymaps. `python train_crossvalidate_test_XGB_sky.py -help` | OK-DC; OK-jrc | a60740bb5a0cccb2be8e8184f16c0c7c93f8150b | | |
|
|
|
| [test_combined_ML_results.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/package/lensid/train_test/test_combined_ML_results.py) | Test and compare to BLU overall ML model. Requires dataframes that already has the ML predictions calculated from the qts and skymaps. `python test_combined_ML_results.py -help` | OK-DC; OK-jrc | a60740bb5a0cccb2be8e8184f16c0c7c93f8150b | | |
|
|
|
|
|
|
|
|
|
## Top-level scripts: Training and testing workflow.
|
|
|
| Scripts | Short description | Status | git hash | Comment | final sign-off |
|
|
|
|---------|-------------------|--------|----------|---------|----------------|
|
|
|
| [condor_data_gen_train_test_config.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/train_test_workflow/condor_data_gen_train_test_config.py) | Generate Qtransforms, Dataframes, Bayestar skymaps for training and testing given the injection parameters using condor dag jobs submission. Note: change `exec_file_loc` in the script according to your installation and `base_out_dir` as desired. | OK-jrc; OK-DC | 493ea099f42fc50d2cc081754d5395f57fafae76 | --------- | ---------------- |
|
|
|
| [config_train_test_workflow.yaml](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/train_test_workflow/config_train_test_workflow.yaml) | config file for training and testing ML models. Note: change `base_out_dir` as desired. | OK-DC; question-jrc | | | |
|
|
|
| [train_three_densenets.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/train_test_workflow/train_three_densenets.py) | Trains the three densenets, needs **config** file as input. Runs very fast on GPU systems. Optionally one can use condor to submit it as job. `python train_three_densenets.py` | OK-DC; OK-jrc | a60740bb5a0cccb2be8e8184f16c0c7c93f8150b | | ---------------- |
|
|
|
| [condor_train_test_features_extraction.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/train_test_workflow/condor_train_test_features_extraction.py) | Extract sky and qts features of the training and testing dataset by condor dag jobs submission. Needs **config** file as input. | OK-DC; OK-jrc | 493ea099f42fc50d2cc081754d5395f57fafae76 | --------- | ---------------- |
|
|
|
| [train_test_XGBs.py](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/train_test_workflow/train_test_XGBs.py) | Train and test the QTs and Skymaps machine learning models, optionally compare to BLU. Needs **config** as input. Runs in <5 mins, optionally can be submitted using condor. | OK-jrc; OK-DC | a60740bb5a0cccb2be8e8184f16c0c7c93f8150b | | ---------------- |
|
|
|
|
|
|
</details>
|
|
|
|
|
|
# Review Calls
|
|
|
|
|
|
The review call happen on Wednesdays 1 PM CEST/ 4:30 PM IST virtual IFPA room: https://shorturl.at/uxLS5
|
|
|
|
|
|
## 4th April 2023
|
|
|
|
|
|
- Discussed MDC results
|
|
|
- Discussed the developments
|
|
|
- Discussed training with real noise with one week of O4 data.
|
|
|
|
|
|
### Action items
|
|
|
|
|
|
- [x] Ask chairs about real noise training and review.
|
|
|
- [ ] Benchmark performance
|
|
|
- [ ] Compare machines trained in uniform masses v/s astrophysical pop. model masses.
|
|
|
- [ ] O4 background computation.
|
|
|
|
|
|
## 3rd May 2023
|
|
|
|
|
|
- Discussed the new functionalities for the ML QTs package scripts.
|
|
|
- Discussed the new uniform in the masses training set.
|
|
|
- A [diff file](https://git.ligo.org/srashti.goyal/lensid/-/blob/master/diff_package_03052023_reviewed_2022.diff) is created to keep track of package changes since the last reviewed version.
|
|
|
- The hard deadline for code review is the start of O4 so the focus is more on the `code` than performance at the moment.
|
|
|
|
|
|
### Action items
|
|
|
|
|
|
- [x] Prepare feature extraction and utils code for review.
|
|
|
- [ ] Sign off data preparation codes.
|
|
|
|
|
|
## 10th May 2023
|
|
|
|
|
|
- We discussed JR's comments on the data preparation scripts.
|
|
|
- We discussed the changes made to codes for dense predictions, ml_utils and the data generator for ML QTs.
|
|
|
- The sign-off column was added to the tables.
|
|
|
- We discussed how should we proceed as the Virgo will be joining O4 3-6 months later.
|
|
|
- During the result review we should check the training size, batch_size etc.
|
|
|
- During the result review we will give results as a function of SNR.
|
|
|
|
|
|
### Action items
|
|
|
|
|
|
- [ ] Prepare training and testing scripts.
|
|
|
- [ ] Sign off feature extraction and ML utils codes.
|
|
|
|