Skip to content
Snippets Groups Projects

Updated cbc offline configuration tutorial

Merged Shio Sakon requested to merge update_offline_config_tutorial into master
+ 165
44
@@ -12,11 +12,24 @@ stack installed. Other methods of installation will follow a similar
procedure, however, with one caveat that workflows will not work on the
Open Science Grid (OSG).
For a dag on the OSG IGWN grid, you must use a Singularity container on
cvmfs, set the ``profile`` in ``config.yaml`` to ``osg`` and make sure
to submit the dag from a OSG node.
Otherwise the workflow is the same.
When running without a Singularity container, the commands below should be
modified. (Such as running ``gstlal_inspiral_workflow init -c config.yml``)
instead of ``singularity exec <image> gstlal_inspiral_workflow init -c config.yml``).
For ICDS gstlalcbc shared accounts, the ``env.sh`` contents much be changed
and instead of running ``$ X509_USER_PROXY=/path/to/x509_proxy ligo-proxy-init -p albert.einstein``
run ``source env.sh``. (Details are below.)
Running Workflows
^^^^^^^^^^^^^^^^^^
1. Build Singularity image (optional)
""""""""""""""""""""""""""""""""""""""
1 Build Singularity image (optional)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
NOTE: If you are using a reference Singularity container (suitable in most
cases), you can skip this step. The ``<image>`` throughout this doc refers to
@@ -31,6 +44,9 @@ To pull a container with gstlal installed, run:
$ singularity build --sandbox --fix-perms <image-name> docker://containers.ligo.org/lscsoft/gstlal:master
To use a branch other than master, you can replace `master` in the above command with the name of the desired branch. To use a custom build instead, gstlal will need to be installed into the container from your modified source code. For installation instructions, see the
`installation page <https://docs.ligo.org/lscsoft/gstlal/installation.html>`_
2. Set up workflow
""""""""""""""""""""
@@ -40,22 +56,50 @@ First, we create a new analysis directory and switch to it:
$ mkdir <analysis-dir>
$ cd <analysis-dir>
$ mkdir bank mass_model idq dtdphi
Default configuration files and data files (template bank/mass model) for a
Default configuration files and environment (``env.sh``) for a
variety of different banks are contained in the
`offline-configuration <https://git.ligo.org/gstlal/offline-configuration>`_
repository.
One can run the commands below to grab the configuration files, or clone the
repository and copy the files as needed into the analysis directory.
To download data files (mass model, template banks) that may be needed for
offline runs, see the
`README <https://git.ligo.org/gstlal/offline-configuration/-/blob/main/README.md>`_
in the offline-configuration repo. Move the template bank(s) into ``bank`` and the mass model into ``mass_model``.
For example, to grab the configuration and data files for the BNS test bank:
For example, to grab all the relevant files for a small BNS dag:
.. code:: bash
$ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/bns-small/config.yml
$ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/bns-small/mass_model/mass_model_small.h5
$ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/bns-small/bank/gstlal_bank_small.xml.gz
$ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/configs/bns-small/config.yml
$ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/env.sh
$ source /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/etc/profile.d/conda.sh
$ conda activate igwn
$ dcc archive --archive-dir=. --files -i T2200318-v2
$ conda deactivate
Then move the template bank, mass model, idq file, and dtdphi file into their corresponding directories.
When running an analysis on the ICDS cluster in the gstlalcbc shared account,
the contents of ``env.sh`` must be changed to what is given below.
In addition, below in the tutorial, where it says to run ``ligo-proxy-init -p``,
instead, run ``source env.sh`` on the modified ``env.sh``.
When running on non gstlalcbc shared accounts on ICDS or when running on other
clusters, the ``env.sh`` does not need to be modifed, and ``ligo-proxy-init -p``
can be run as in the tutorial.
.. code-block:: yaml
export PYTHONUNBUFFERED=1
unset X509_USER_PROXY
export X509_USER_CERT=/ligo/home/ligo.org/gstlalcbc/.cert/gstlalcbc_icds_robot.key.pem
export X509_USER_KEY=/ligo/home/ligo.org/gstlalcbc/.cert/gstlalcbc_icds_robot.key.pem
export GSTLAL_FIR_WHITEN=0
Alternatively, one can clone the repository and copy files as needed into the
analysis directory.
Now, we'll need to modify the configuration as needed to run the analysis. At
the very least, setting the start/end times and the instruments to run over:
@@ -67,18 +111,19 @@ the very least, setting the start/end times and the instruments to run over:
instruments: H1L1
We also required template bank(s) and a mass model. Ensure these are pointed to
the right place in the configuration:
Ensure the template bank, mass model, idq file, and dtdphi file are pointed to in the configuration:
.. code-block:: yaml
data:
template-bank: gstlal_bank_small.xml.gz
template-bank: bank/gstlal_bank_small.xml.gz
.. code-block:: yaml
prior:
mass-model: mass_model_small.h5
mass-model: bank/mass_model_small.h5
idq-timeseries: idq/H1L1-IDQ_TIMESERIES-1239641219-692847.h5
dtdphi: dtdphi/inspiral_dtdphi_pdf.h5
If you're creating a summary page for results, you'll need to point at a
location where they are web-viewable:
@@ -86,7 +131,7 @@ location where they are web-viewable:
.. code-block:: yaml
summary:
webdir: /path/to/summary
webdir: ~/public_html/
If you're running on LIGO compute resources and your username doesn't match your
albert.einstein username, you'll also additionally need to specify the
@@ -97,17 +142,17 @@ accounting group user for condor to track accounting information:
condor:
accounting-group-user: albert.einstein
In addition, update the ``singularity-image`` the ``condor`` section of your configuration if needed:
In addition, update the ``singularity-image`` in the ``condor`` section of your configuration if needed:
.. code-block:: yaml
condor:
singularity-image: /cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master
If not using the reference Singularity image, you can replace this line with the
full path to a local container.
If not using a reference Singularity image, you can replace this with the
full path to a local singularity container ``<image>``.
For more detailed configuration options, take a look at the :ref:`configuration
For more detailed configuration options, take a look at the :ref:`configuration
section <analysis-configuration>` below.
If you haven't installed site-specific profiles yet (per-user), you can run:
@@ -124,6 +169,7 @@ You can select which profile to use in the ``condor`` section:
condor:
profile: ldas
For a OSG IGWN grid run, use ``osg``.
To view which profiles are available, you can run:
.. code:: bash
@@ -164,6 +210,14 @@ to ensure you can get access to LIGO data:
Note that we are running this step outside of Singularity. This is because ``ligo-proxy-init``
is not installed within the image currently.
If you are running on the ICDS gstlalcbc shared account, do not run the command
above.
Instead, run:
.. code:: bash
$ source env.sh
Also update the configuration accordingly (if needed):
@@ -178,20 +232,29 @@ Finally, set up the rest of the workflow including the DAG for submission:
$ singularity exec -B $TMPDIR <image> make dag
If running on the OSG IGWN grid, make sure to submit the dags from the OSG node.
This should create condor DAGs for the workflow. Mounting a temporary directory
is important as some of the steps will leverage a temporary space to generate files.
If one desires to see detailed error messages, add ``<PYTHONUNBUFFERED=1>`` to
``environment`` in the submit (``*.sub``) files by running:
.. code:: bash
$ sed -i '/^environment = / s/\"$/ PYTHONUNBUFFERED=1\"/' *.sub
3. Launch workflows
"""""""""""""""""""""""""
.. code:: bash
$ source env.sh
$ make launch
This is simply a thin wrapper around `condor_submit_dag` launching the DAG in question.
You can monitor the dag with Condor CLI tools such as ``condor_q``.
You can monitor the dag with Condor CLI tools such as ``condor_q`` and ``tail -f full_inspiral_dag.dag.dagman.out``.
4. Generate Summary Page
"""""""""""""""""""""""""
@@ -200,8 +263,13 @@ After the DAG has completed, you can generate the summary page for the analysis:
.. code:: bash
$ singularity exec -B $TMPDIR <image> make summary
$ singularity exec <image> make summary
To make an open-box page after this, run:
.. code:: bash
$ make unlock
.. _analysis-configuration:
@@ -218,13 +286,13 @@ The top-level configuration consists of the analysis times and detector configur
instruments: H1L1
min-instruments: 1
These set the start and stop times of the analysis, plus the detectors to use
(H1=Hanford, L1=Livingston, V1=Virgo). The start and stop times are gps times,
there is a nice online converter that can be used here:
https://www.gw-openscience.org/gps/. You can also use the program `gpstime` as
These set the start and stop gps times of the analysis, plus the detectors to use
(H1=Hanford, L1=Livingston, V1=Virgo). There is a nice online converter for gps times
here: https://www.gw-openscience.org/gps/. You can also use the program `gpstime` as
well. Note that these start and stop times have no knowledge about science
quality data, the actual science quality data that are analyzed is typically a
subset of the total time.
subset of the total time. Information about which detectors were on at different
times is available here: https://www.gw-openscience.org/data/.
``min-instruments`` sets the minimum number of instruments we will allow to form
an event, e.g. setting it to 1 means the analysis will consider single detector
@@ -250,12 +318,26 @@ The ``analysis-dir`` option is used if the user wishes to point to an existing
analysis to perform a rerank or an injection-only workflow. This grabs existing files
from this directory to seed the rerank/injection workflows.
One can use multiple sub template banks. In this case, the configuration might look like:
.. code-block:: yaml
data:
template-bank:
bns: bank/sub_bank/bns.xml.gz
nsbh: bank/sub_bank/nsbh.xml.gz
bbh_1: bank/sub_bank/bbh_low_q.xml.gz
bbh_2: bank/sub_bank/other_bbh.xml.gz
imbh: bank/sub_bank/imbh_low_q.xml.gz
Section: Source
""""""""""""""""
.. code-block:: yaml
source:
data-source: frames
data-find-server: datafind.gw-openscience.org
frame-type:
H1: H1_GWOSC_O2_16KHZ_R1
@@ -263,8 +345,10 @@ Section: Source
channel-name:
H1: GWOSC-16KHZ_R1_STRAIN
L1: GWOSC-16KHZ_R1_STRAIN
sample-rate: 4096
frame-segments-file: segments.xml.gz
frame-segments-name: datasegments
x509-proxy: x509_proxy
The ``data-find-server`` option points to a server that is queried to find the
location of frame files. The address shown above is a publicly available server
@@ -277,7 +361,8 @@ available. These files are generalized enough that they could describe different
types of data, so ``frame-segments-name`` is used to specify which segment to
consider. In practice, the segments file we produce will only contain the
segments we want. Users will typically not change any of these options once they
are set for a given instrument and observing run.
are set for a given instrument and observing run. ``x509-proxy`` is the path to
your ``x509-proxy``.
Section: Segments
""""""""""""""""""
@@ -295,7 +380,7 @@ An example of configuration with the ``gwosc`` backend looks like:
vetoes:
category: CAT1
Here, the ``backend`` is set to ``gwosc`` so both segments are vetoes are determined
Here, the ``backend`` is set to ``gwosc`` so both segments and vetoes are determined
by querying the GWOSC server. There is no additional configuration needed to query
segments, but for vetoes, we also need to specify the ``category`` used for vetoes.
This can be one of ``CAT1``, ``CAT2``, or ``CAT3``. By default, segments are generated
@@ -318,7 +403,7 @@ An example of configuration with the ``dqsegdb`` backend looks like:
version: O3b_CBC_H1L1V1_C01_v1.2
epoch: O3
Here, the ``backend`` is set to ``dqsegdb`` so both segments are vetoes are determined
Here, the ``backend`` is set to ``dqsegdb`` so both segments and vetoes are determined
by querying the DQSEGDB server. To query segments, one needs to specify the flag used
per instrument to query segments from. For vetoes, we need to specify the ``category``
used for vetoes as with the ``dqsegdb`` backend. Additionally, a veto definer file is
@@ -333,6 +418,7 @@ Section: PSD
psd:
fft-length: 8
sample-rate: 4096
The PSD estimation method used by GstLAL is a modified median-Welch method that
is described in detail in Section IIB of Ref [1]. The FFT length sets the length
@@ -355,10 +441,10 @@ Section: SVD
num-chi-bins: 1
sort-by: mchirp
approximant:
- 0:1000:TaylorF2
- 0:1.73:TaylorF2
- 1.73:1000:SEOBNRv4_ROM
tolerance: 0.9999
max-f-final: 512.0
sample-rate: 1024
max-f-final: 1024.0
num-split-templates: 200
overlap: 30
num-banks: 5
@@ -366,7 +452,8 @@ Section: SVD
samples-max-64: 2048
samples-max-256: 2048
samples-max: 4096
autocorrelation-length: 351
autocorrelation-length: 701
max-duration: 128
manifest: svd_manifest.json
``f-low`` sets the lower frequency cutoff for the analysis in Hz.
@@ -376,7 +463,10 @@ procedure; specifically, sets the number of effective spin parameter bins to use
in the chirp-mass / effective spin binning procedure described in Sec. IID and
Fig. 6 of [1].
``sort-by`` selects the template sort column. This controls how to bin the bank in sub-banks suitable for the svd decomposition.
``sort-by`` selects the template sort column. This controls how to bin the
bank in sub-banks suitable for the svd decomposition. It can be ``mchirp``
(sorts by chirp mass), ``mu`` (sorts by mu1 and mu2 coordiantes), or
``template_duration`` (sorts by template duration).
``approximant`` specifies the waveform approximant that should be used along
with chirp mass bounds to use that approximant in. 0:1000:TaylorF2 means use the
@@ -418,10 +508,16 @@ in any time slice with a sample rate greater than 256 Hz.
``autocorrelation-length`` sets the number of samples to use when computing the
autocorrelation-based test-statistic, described in IIIC of Ref [1].
``max-duration`` sets the maximum template duration in seconds. One can choose
not to use ``max-duration``.
``manifest`` sets the name of a file that will contain metadata about the
template bank bins.
Users will not typically change these options.
If one uses multiple sub template banks, SVD configurations can be specified
for each sub template bank. Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
Users will typically not change these options.
Section: Filter
""""""""""""""""
@@ -430,14 +526,18 @@ Section: Filter
filter:
fir-stride: 1
min-instruments: 1
coincidence-threshold: 0.01
ht-gate-threshold: 0.8:15.0-45.0:100.0
veto-segments-file: vetoes.xml.gz
time-slide-file: tisi.xml
injection-time-slide-file: inj_tisi.xml
time-slides:
H1: 0:0:0
L1: 0.62831:0.62831:0.62831
injections:
bns:
file: injections/bns_injections.xml
file: bns_injections.xml
range: 0.01:1000.0
``fir-stride`` is a tunable parameter related to the matched-filter procedure,
@@ -475,15 +575,19 @@ with their own label, file, and range.
The only option here that a user will normally interact with is the injections
option.
When using multiple sub template banks, replace ``bns:`` under ``injections:``
with ``inj:``
Section: Injections
""""""""""""""""""""
.. code-block:: yaml
injections:
expected-snr:
f-low: 15.0
sets:
expected-snr:
f-low: 15.0
bns:
f-low: 14.0
seed: 72338
@@ -508,6 +612,7 @@ Section: Injections
distance:
min: 10000
max: 80000
spin-aligned: True
file: bns_injections.xml
The ``sets`` subsection is used to create injection sets to be used within the
@@ -516,9 +621,15 @@ injections are grouped by key. In this case, one ``bns`` injection set which
creates the ``bns_injections.xml`` file and used in the ``injections`` section
of the ``filter`` section.
For multiple injections, the chunk for ``bns:`` should be repeated for each
injection. Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
Besides creating injection sets, the ``expected-snr`` subsection is used for the
expected SNR jobs. These settings are used to override defaults as needed.
``spin-aligned`` specifies whether the injections should be spin-(mis)aligned
spins (if ``spin-aligned: True``) or precessing spins (if ``spin-aligned: False``).
In the case of multiple injection sets that need to be combined, one can add
a few options to create a combined file and reference that within the filter
jobs. This can be useful for large banks with a large set of templates. To
@@ -547,16 +658,24 @@ Section: Prior
.. code-block:: yaml
prior:
mass-model: model/mass_model_small.h5
mass-model: mass_model/mass_model_small.h5
``mass-model`` is a relative path to the file that contains the mass model. This
model is used to weight templates appropriately when assigning ranking
statistics based on our understanding of the astrophysical distribution of
signals. Users will not typically change this option.
An optional ``dtdphi-file`` can be provided here. If not given, a default model
(included in the standard installation) will be used. The dtdph file will specify a probability distribution function for the probability of measuring a given time shift and
phase shift in a two detector observation. It enters in the ranking statistics.
An optional ``dtdphi-file`` and ``idq-timeseries`` can be provided here. If not
given, a default model (included in the standard installation) will be used.
The dtdph file will specify a probability distribution function for the
probability of measuring a given time shift and phase shift in mulitple detector
observation. It enters in the ranking statistics.
The idq file will give information about the data quality around the time of
coalescence.
If specifying idq files and dtdphi files, create a directory for idq and dtdphi
each in the ``<analysis-dir>``, and put the idq files and dtdphi files in the
respective directory.
Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
Section: Rank
""""""""""""""""
@@ -589,7 +708,8 @@ Section: Condor
condor:
profile: osg-public
accounting-group: ligo.dev.o3.cbc.uber.gstlaloffline
singularity-image: /cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master
accounting-group-user: <albert.einstein>
singularity-image: <image>
``profile`` sets a base level of configuration options for condor.
@@ -598,7 +718,8 @@ the machinery to produce an analysis dag requires this option, but the option is
not actually used by analyses running on non-LDG resources.
``singularity-image`` sets the path of the container on cvmfs that the analysis
should use. Users will not typically change this option.
should use. Users will not typically change this option
(use ``/cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master``).
.. _install-custom-profiles:
Loading