WIP: EM-Bright (!473) · Merge requests · lscsoft / lalsuite

Shaon Ghosh requested to merge shaon.ghosh/lalsuite:embright_branch into master Sep 29, 2018

This merge request is for the integration of the EM-Bright infrastructure to LALsuite. For O3, we are completely overhauling the EM-Bright pipeline. We are moving away from the ellipsoid based computation that we conducted in O2 (https://dcc.ligo.org/LIGO-T1600570) to a machine learning based infrastructure for O3 (https://dcc.ligo.org/LIGO-G1801592). Thus the pipeline now will have two aspects,

Training: A training pipeline that relies on injection campaigns from the detection pipelines: We use a Random Forest classifier to train the EM-Bright pipeline to discriminate between EM-Bright and EM-Dark sources.
Inference: Events obtained in low-latency will trigger the inference code (source_classification) in lalinference.embright and that will give the probability of the event being EM-Bright.

The code is packaged as follows: In lalinference/python the executables are put which are necessary for running of a dag that will be required to conduct the training of the EM-Bright pipeline. To do this we have to ascertain the amount of disk mass that is created during the coalescence using a variation of Foucart’s formula (arXiv:1807.00011) by Foucart, Hinderer and Nissanke. The code performing this calculation is called computeDiskMass.py which is placed in a newly constructed embright directory under lalinference/python/lalinference. This code (currently) uses an extremely stiff equation of state of the neutron star (2H) to compute the amount of tidally disrupted matter. A data file containing the required information for this state (namely the Baryonic mass and the compactness) also resides in this directory. All these are packaged such that they are accessible as: from lalinference.embright import computeDiskMass. A script which is designed to write a dag specifically for the training jobs is in the lalinference/python directory which uses the various other EM-Bright scripts (listed below) to create the dag that when run will generate pickle files that contains the machine learned information for triggers.

When a new trigger arrives, this pickle file can be used to generate the probability whether an event is EM-Bright or EM-Dark.

Extra Requirement: scikit-learn

Details on how the training process can be emulated:

I used conda to create an environment that has scikit-learn (since it is not available on the ldg clusters). Once this version of LALsuite is installed, running command will generate a dag

embright_create_train_dag -c conf.ini

Where, conf.ini is a configuration script, an example of which is attached in this description(conf.ini). The user will have to change the path to the following elements of the file (path and injdir). Submitting the dag should generate the required machine learned pickle files as long as you have injection sqlite files in injdir which are of the form H1L1-ALL_LLOID_split_injections_<NUM>-<gps-start-time>-<duration>.sqlite, where NUM is any number. An examples of such sqlite files could be found here /work/gstlalcbc/observing/2/offline/C00/chunk_21_1186624818-1187312718_run_3/H1L1-ALL_LLOID_split_injections_0000-1186624818-687900.sqlite on NEMO.

This is a work in progress, and the inference part of this project is currently being updated.

WIP: EM-Bright

Merge request reports