Shio Sakon · a09f9c35 · 7ff3cd5a · be5f2ef7 · 0128c572 · 486b36ce
--- a/doc/source/cbc_analysis.rst

+ 165

− 44
+++ b/doc/source/cbc_analysis.rst

+ 165

− 44
 @@ -12,11 +12,24 @@ stack installed. Other methods of installation will follow a similar
 procedure, however, with one caveat that workflows will not work on the
 Open Science Grid (OSG).

+For a dag on the OSG IGWN grid, you must use a Singularity container on 
+cvmfs, set the ``profile`` in ``config.yaml`` to ``osg`` and make sure 
+to submit the dag from a OSG node. 
+Otherwise the workflow is the same. 
+
+When running without a Singularity container, the commands below should be 
+modified. (Such as running ``gstlal_inspiral_workflow init -c config.yml``) 
+instead of ``singularity exec <image> gstlal_inspiral_workflow init -c config.yml``). 
+
+For ICDS gstlalcbc shared accounts, the ``env.sh`` contents much be changed 
+and instead of running ``$ X509_USER_PROXY=/path/to/x509_proxy ligo-proxy-init -p albert.einstein``
+run ``source env.sh``. (Details are below.)
+
 Running Workflows
 ^^^^^^^^^^^^^^^^^^

-1. Build Singularity image (optional)
-""""""""""""""""""""""""""""""""""""""
+1 Build Singularity image (optional)
+"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""

 NOTE: If you are using a reference Singularity container (suitable in most
 cases), you can skip this step.  The ``<image>`` throughout this doc refers to
 @@ -31,6 +44,9 @@ To pull a container with gstlal installed, run:

    $ singularity build --sandbox --fix-perms <image-name> docker://containers.ligo.org/lscsoft/gstlal:master

+To use a branch other than master, you can replace `master` in the above command with the name of the desired branch. To use a custom build instead, gstlal will need to be installed into the container from your modified source code. For installation instructions, see the
+`installation page <https://docs.ligo.org/lscsoft/gstlal/installation.html>`_
+
 2. Set up workflow
 """"""""""""""""""""

 @@ -40,22 +56,50 @@ First, we create a new analysis directory and switch to it:

   $ mkdir <analysis-dir>
   $ cd <analysis-dir>
+   $ mkdir bank mass_model idq dtdphi

-Default configuration files and data files (template bank/mass model) for a
+Default configuration files and environment (``env.sh``) for a
 variety of different banks are contained in the
 `offline-configuration <https://git.ligo.org/gstlal/offline-configuration>`_
 repository.
+One can run the commands below to grab the configuration files, or clone the 
+repository and copy the files as needed into the analysis directory. 
+To download data files (mass model, template banks) that may be needed for 
+offline runs, see the
+`README <https://git.ligo.org/gstlal/offline-configuration/-/blob/main/README.md>`_
+in the offline-configuration repo. Move the template bank(s) into ``bank`` and the mass model into ``mass_model``.
+

-For example, to grab the configuration and data files for the BNS test bank:
+For example, to grab all the relevant files for a small BNS dag:

 .. code:: bash

-    $ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/bns-small/config.yml
-    $ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/bns-small/mass_model/mass_model_small.h5
-    $ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/bns-small/bank/gstlal_bank_small.xml.gz
+    $ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/configs/bns-small/config.yml
+    $ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/env.sh
+    $ source /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/etc/profile.d/conda.sh
+    $ conda activate igwn
+    $ dcc archive --archive-dir=. --files -i T2200318-v2
+    $ conda deactivate
+
+
+Then move the template bank, mass model, idq file, and dtdphi file into their corresponding directories.
+
+
+When running an analysis on the ICDS cluster in the gstlalcbc shared account, 
+the contents of ``env.sh`` must be changed to what is given below.
+In addition, below in the tutorial, where it says to run ``ligo-proxy-init -p``, 
+instead, run ``source env.sh`` on the modified ``env.sh``. 
+When running on non gstlalcbc shared accounts on ICDS or when running on other 
+clusters, the ``env.sh`` does not need to be modifed, and ``ligo-proxy-init -p`` 
+can be run as in the tutorial. 
+
+.. code-block:: yaml
+   export PYTHONUNBUFFERED=1
+   unset X509_USER_PROXY
+   export X509_USER_CERT=/ligo/home/ligo.org/gstlalcbc/.cert/gstlalcbc_icds_robot.key.pem
+   export X509_USER_KEY=/ligo/home/ligo.org/gstlalcbc/.cert/gstlalcbc_icds_robot.key.pem
+   export GSTLAL_FIR_WHITEN=0

-Alternatively, one can clone the repository and copy files as needed into the
-analysis directory.

 Now, we'll need to modify the configuration as needed to run the analysis. At
 the very least, setting the start/end times and the instruments to run over:
 @@ -67,18 +111,19 @@ the very least, setting the start/end times and the instruments to run over:

    instruments: H1L1

-We also required template bank(s) and a mass model. Ensure these are pointed to
-the right place in the configuration:
+Ensure the template bank, mass model, idq file, and dtdphi file are pointed to in the configuration:

 .. code-block:: yaml

    data: 
-      template-bank: gstlal_bank_small.xml.gz
+      template-bank: bank/gstlal_bank_small.xml.gz

 .. code-block:: yaml

    prior:
-      mass-model: mass_model_small.h5
+      mass-model: bank/mass_model_small.h5
+      idq-timeseries: idq/H1L1-IDQ_TIMESERIES-1239641219-692847.h5
+      dtdphi: dtdphi/inspiral_dtdphi_pdf.h5

 If you're creating a summary page for results, you'll need to point at a
 location where they are web-viewable:
 @@ -86,7 +131,7 @@ location where they are web-viewable:
 .. code-block:: yaml

    summary:
-      webdir: /path/to/summary
+      webdir: ~/public_html/

 If you're running on LIGO compute resources and your username doesn't match your
 albert.einstein username, you'll also additionally need to specify the
 @@ -97,17 +142,17 @@ accounting group user for condor to track accounting information:
    condor:
      accounting-group-user: albert.einstein

-In addition, update the ``singularity-image`` the ``condor`` section of your configuration if needed:
+In addition, update the ``singularity-image`` in the ``condor`` section of your configuration if needed:

 .. code-block:: yaml

    condor:
      singularity-image: /cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master

-If not using the reference Singularity image, you can replace this line with the
-full path to a local container.
+If not using a reference Singularity image, you can replace this with the
+full path to a local singularity container ``<image>``.

-For more detailed configuration options, take a look at the  :ref:`configuration
+For more detailed configuration options, take a look at the :ref:`configuration
 section <analysis-configuration>` below.

 If you haven't installed site-specific profiles yet (per-user), you can run:
 @@ -124,6 +169,7 @@ You can select which profile to use in the ``condor`` section:
    condor:
      profile: ldas

+For a OSG IGWN grid run, use ``osg``.
 To view which profiles are available, you can run:

 .. code:: bash
 @@ -164,6 +210,14 @@ to ensure you can get access to LIGO data:

 Note that we are running this step outside of Singularity. This is because ``ligo-proxy-init``
 is not installed within the image currently.
+If you are running on the ICDS gstlalcbc shared account, do not run the command 
+above. 
+Instead, run:
+
+.. code:: bash
+
+    $ source env.sh
+

 Also update the configuration accordingly (if needed):

 @@ -178,20 +232,29 @@ Finally, set up the rest of the workflow including the DAG for submission:

    $ singularity exec -B $TMPDIR <image> make dag

-
+If running on the OSG IGWN grid, make sure to submit the dags from the OSG node.
 This should create condor DAGs for the workflow. Mounting a temporary directory
 is important as some of the steps will leverage a temporary space to generate files.

+If one desires to see detailed error messages, add ``<PYTHONUNBUFFERED=1>`` to 
+``environment`` in the submit (``*.sub``) files by running:
+
+.. code:: bash
+
+    $ sed -i '/^environment = / s/\"$/ PYTHONUNBUFFERED=1\"/' *.sub
+
+
 3. Launch workflows
 """""""""""""""""""""""""

 .. code:: bash

+    $ source env.sh
    $ make launch

 This is simply a thin wrapper around `condor_submit_dag` launching the DAG in question.

-You can monitor the dag with Condor CLI tools such as ``condor_q``.
+You can monitor the dag with Condor CLI tools such as ``condor_q`` and ``tail -f full_inspiral_dag.dag.dagman.out``.

 4. Generate Summary Page
 """""""""""""""""""""""""
 @@ -200,8 +263,13 @@ After the DAG has completed, you can generate the summary page for the analysis:

 .. code:: bash

-    $ singularity exec -B $TMPDIR <image> make summary
+    $ singularity exec <image> make summary
+
+To make an open-box page after this, run:
+
+.. code:: bash

+    $ make unlock

 .. _analysis-configuration:

 @@ -218,13 +286,13 @@ The top-level configuration consists of the analysis times and detector configur
    instruments: H1L1
    min-instruments: 1

-These set the start and stop times of the analysis, plus the detectors to use
-(H1=Hanford, L1=Livingston, V1=Virgo). The start and stop times are gps times,
-there is a nice online converter that can be used here:
-https://www.gw-openscience.org/gps/. You can also use the program `gpstime` as
+These set the start and stop gps times of the analysis, plus the detectors to use
+(H1=Hanford, L1=Livingston, V1=Virgo). There is a nice online converter for gps times
+here: https://www.gw-openscience.org/gps/. You can also use the program `gpstime` as
 well. Note that these start and stop times have no knowledge about science
 quality data, the actual science quality data that are analyzed is typically a
-subset of the total time.
+subset of the total time. Information about which detectors were on at different
+times is available here: https://www.gw-openscience.org/data/.

 ``min-instruments`` sets the minimum number of instruments we will allow to form
 an event, e.g. setting it to 1 means the analysis will consider single detector
 @@ -250,12 +318,26 @@ The ``analysis-dir`` option is used if the user wishes to point to an existing
 analysis to perform a rerank or an injection-only workflow. This grabs existing files
 from this directory to seed the rerank/injection workflows.

+One can use multiple sub template banks. In this case, the configuration might look like:
+
+.. code-block:: yaml
+
+    data:
+      template-bank: 
+        bns: bank/sub_bank/bns.xml.gz
+        nsbh: bank/sub_bank/nsbh.xml.gz
+        bbh_1: bank/sub_bank/bbh_low_q.xml.gz
+        bbh_2: bank/sub_bank/other_bbh.xml.gz
+        imbh: bank/sub_bank/imbh_low_q.xml.gz
+
+
 Section: Source
 """"""""""""""""

 .. code-block:: yaml

    source:
+      data-source: frames
      data-find-server: datafind.gw-openscience.org
      frame-type:
        H1: H1_GWOSC_O2_16KHZ_R1
 @@ -263,8 +345,10 @@ Section: Source
      channel-name:
        H1: GWOSC-16KHZ_R1_STRAIN
        L1: GWOSC-16KHZ_R1_STRAIN
+      sample-rate: 4096
      frame-segments-file: segments.xml.gz
      frame-segments-name: datasegments
+      x509-proxy: x509_proxy

 The ``data-find-server`` option points to a server that is queried to find the
 location of frame files. The address shown above is a publicly available server
 @@ -277,7 +361,8 @@ available. These files are generalized enough that they could describe different
 types of data, so ``frame-segments-name`` is used to specify which segment to
 consider. In practice, the segments file we produce will only contain the
 segments we want. Users will typically not change any of these options once they
-are set for a given instrument and observing run.
+are set for a given instrument and observing run. ``x509-proxy`` is the path to 
+your ``x509-proxy``. 

 Section: Segments
 """"""""""""""""""
 @@ -295,7 +380,7 @@ An example of configuration with the ``gwosc`` backend looks like:
      vetoes:
        category: CAT1

-Here, the ``backend`` is set to ``gwosc`` so both segments are vetoes are determined
+Here, the ``backend`` is set to ``gwosc`` so both segments and vetoes are determined
 by querying the GWOSC server. There is no additional configuration needed to query
 segments, but for vetoes, we also need to specify the ``category`` used for vetoes.
 This can be one of ``CAT1``, ``CAT2``, or ``CAT3``. By default, segments are generated
 @@ -318,7 +403,7 @@ An example of configuration with the ``dqsegdb`` backend looks like:
          version: O3b_CBC_H1L1V1_C01_v1.2
          epoch: O3

-Here, the ``backend`` is set to ``dqsegdb`` so both segments are vetoes are determined
+Here, the ``backend`` is set to ``dqsegdb`` so both segments and vetoes are determined
 by querying the DQSEGDB server. To query segments, one needs to specify the flag used
 per instrument to query segments from. For vetoes, we need to specify the ``category``
 used for vetoes as with the ``dqsegdb`` backend. Additionally, a veto definer file is
 @@ -333,6 +418,7 @@ Section: PSD

    psd:
      fft-length: 8
+      sample-rate: 4096

 The PSD estimation method used by GstLAL is a modified median-Welch method that
 is described in detail in Section IIB of Ref [1]. The FFT length sets the length
 @@ -355,10 +441,10 @@ Section: SVD
      num-chi-bins: 1
      sort-by: mchirp
      approximant:
-        - 0:1000:TaylorF2
+        - 0:1.73:TaylorF2
+        - 1.73:1000:SEOBNRv4_ROM
      tolerance: 0.9999
-      max-f-final: 512.0
-      sample-rate: 1024
+      max-f-final: 1024.0
      num-split-templates: 200
      overlap: 30
      num-banks: 5
 @@ -366,7 +452,8 @@ Section: SVD
      samples-max-64: 2048
      samples-max-256: 2048
      samples-max: 4096
-      autocorrelation-length: 351
+      autocorrelation-length: 701
+      max-duration: 128
      manifest: svd_manifest.json

 ``f-low`` sets the lower frequency cutoff for the analysis in Hz. 
 @@ -376,7 +463,10 @@ procedure; specifically, sets the number of effective spin parameter bins to use
 in the chirp-mass / effective spin binning procedure described in Sec. IID and
 Fig. 6 of [1].

-``sort-by`` selects the template sort column. This controls how to bin the bank in sub-banks suitable for the svd decomposition.
+``sort-by`` selects the template sort column. This controls how to bin the 
+bank in sub-banks suitable for the svd decomposition. It can be ``mchirp`` 
+(sorts by chirp mass), ``mu`` (sorts by mu1 and mu2 coordiantes), or 
+``template_duration`` (sorts by template duration). 

 ``approximant`` specifies the waveform approximant that should be used along
 with chirp mass bounds to use that approximant in. 0:1000:TaylorF2 means use the
 @@ -418,10 +508,16 @@ in any time slice with a sample rate greater than 256 Hz.
 ``autocorrelation-length`` sets the number of samples to use when computing the
 autocorrelation-based test-statistic, described in IIIC of Ref [1].

+``max-duration`` sets the maximum template duration in seconds. One can choose 
+not to use ``max-duration``. 
+
 ``manifest`` sets the name of a file that will contain metadata about the
 template bank bins.

-Users will not typically change these options.
+If one uses multiple sub template banks, SVD configurations can be specified 
+for each sub template bank. Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
+
+Users will typically not change these options.

 Section: Filter
 """"""""""""""""
 @@ -430,14 +526,18 @@ Section: Filter

    filter:
      fir-stride: 1
+      min-instruments: 1
      coincidence-threshold: 0.01
      ht-gate-threshold: 0.8:15.0-45.0:100.0
      veto-segments-file: vetoes.xml.gz
      time-slide-file: tisi.xml
      injection-time-slide-file: inj_tisi.xml
+      time-slides:
+        H1: 0:0:0
+        L1: 0.62831:0.62831:0.62831
      injections:
        bns:
-          file: injections/bns_injections.xml
+          file: bns_injections.xml
          range: 0.01:1000.0

 ``fir-stride`` is a tunable parameter related to the matched-filter procedure,
 @@ -475,15 +575,19 @@ with their own label, file, and range.
 The only option here that a user will normally interact with is the injections
 option. 

+When using multiple sub template banks, replace ``bns:`` under ``injections:`` 
+with ``inj:``
+
+
 Section: Injections
 """"""""""""""""""""

 .. code-block:: yaml

    injections:
-      expected-snr:
-        f-low: 15.0
      sets:
+        expected-snr:
+          f-low: 15.0
        bns:
          f-low: 14.0
          seed: 72338
 @@ -508,6 +612,7 @@ Section: Injections
          distance:
            min: 10000
            max: 80000
+          spin-aligned: True
          file: bns_injections.xml

 The ``sets`` subsection is used to create injection sets to be used within the
 @@ -516,9 +621,15 @@ injections are grouped by key. In this case, one ``bns`` injection set which
 creates the ``bns_injections.xml`` file and used in the ``injections`` section
 of the ``filter`` section.

+For multiple injections, the chunk for ``bns:`` should be repeated for each 
+injection. Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
+
 Besides creating injection sets, the ``expected-snr`` subsection is used for the
 expected SNR jobs. These settings are used to override defaults as needed.

+``spin-aligned`` specifies whether the injections should be spin-(mis)aligned 
+spins (if ``spin-aligned: True``) or precessing spins (if ``spin-aligned: False``).
+
 In the case of multiple injection sets that need to be combined, one can add
 a few options to create a combined file and reference that within the filter
 jobs. This can be useful for large banks with a large set of templates. To
 @@ -547,16 +658,24 @@ Section: Prior
 .. code-block:: yaml

    prior:
-      mass-model: model/mass_model_small.h5
+      mass-model: mass_model/mass_model_small.h5

 ``mass-model`` is a relative path to the file that contains the mass model. This
 model is used to weight templates appropriately when assigning ranking
 statistics based on our understanding of the astrophysical distribution of
 signals. Users will not typically change this option.

-An optional ``dtdphi-file`` can be provided here. If not given, a default model
-(included in the standard installation) will be used. The dtdph file will specify a probability distribution function for the probability of measuring a given time shift and 
-phase shift in a two detector observation. It enters in the ranking statistics.
+An optional ``dtdphi-file`` and ``idq-timeseries`` can be provided here. If not 
+given, a default model (included in the standard installation) will be used. 
+The dtdph file will specify a probability distribution function for the 
+probability of measuring a given time shift and phase shift in mulitple detector 
+observation. It enters in the ranking statistics.
+The idq file will give information about the data quality around the time of 
+coalescence. 
+If specifying idq files and dtdphi files, create a directory for idq and dtdphi 
+each in the ``<analysis-dir>``, and put the idq files and dtdphi files in the 
+respective directory. 
+Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .

 Section: Rank
 """"""""""""""""
 @@ -589,7 +708,8 @@ Section: Condor
    condor:
      profile: osg-public
      accounting-group: ligo.dev.o3.cbc.uber.gstlaloffline
-      singularity-image: /cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master
+      accounting-group-user: <albert.einstein>
+      singularity-image: <image>

 ``profile`` sets a base level of configuration options for condor.

 @@ -598,7 +718,8 @@ the machinery to produce an analysis dag requires this option, but the option is
 not actually used by analyses running on non-LDG resources.

 ``singularity-image`` sets the path of the container on cvmfs that the analysis
-should use. Users will not typically change this option.
+should use. Users will not typically change this option 
+(use ``/cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master``).

 .. _install-custom-profiles: