Compare revisions

626a0205 · 626a0205 · 626a0205 · 626a0205 · 626a0205 · 626a0205
--- a/doc/source/_static/img/mr-resolve.png
+++ b/doc/source/_static/img/mr-resolve.png
--- a/doc/source/_static/img/mr-respond.png
+++ b/doc/source/_static/img/mr-respond.png
--- a/doc/source/_templates/layout.html
+++ b/doc/source/_templates/layout.html
+{%- set logo = "gstlal.png" %}
+{% extends "!layout.html" %}
--- a/doc/source/api.rst
+++ b/doc/source/api.rst
+GstLAL API
+============
+
+.. toctree::
+   :maxdepth: 2
+   :glob:
+
+   gstlal/python-modules/*modules
+   gstlal-inspiral/python-modules/*modules
+   gstlal-burst/python-modules/*modules
+   gstlal-ugly/python-modules/*modules
--- a/doc/source/cbc_analysis.rst
+++ b/doc/source/cbc_analysis.rst
+.. _cbc-analysis:
+
+CBC Analysis (Offline)
+========================
+
+To start an offline CBC analysis, you'll need a configuration file
+to point at the start/end times to analyze, input data products
+(e.g. template bank, mass model) and other workflow-related configuration needed.
+
+All the below steps assume a Singularity container with the GstLAL software
+stack installed. Other methods of installation will follow a similar
+procedure, however, with one caveat that workflows will not work on the
+Open Science Grid (OSG).
+
+For a dag on the OSG IGWN grid, you must use a Singularity container on 
+cvmfs, set the ``profile`` in ``config.yaml`` to ``osg`` and make sure 
+to submit the dag from a OSG node. 
+Otherwise the workflow is the same. 
+
+When running without a Singularity container, the commands below should be 
+modified. (Such as running ``gstlal_inspiral_workflow init -c config.yml``) 
+instead of ``singularity exec <image> gstlal_inspiral_workflow init -c config.yml``). 
+
+For ICDS gstlalcbc shared accounts, the ``env.sh`` contents much be changed 
+and instead of running ``$ X509_USER_PROXY=/path/to/x509_proxy ligo-proxy-init -p albert.einstein``
+run ``source env.sh``. (Details are below.)
+
+Running Workflows
+^^^^^^^^^^^^^^^^^^
+
+1 Build Singularity image (optional)
+"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+
+NOTE: If you are using a reference Singularity container (suitable in most
+cases), you can skip this step.  The ``<image>`` throughout this doc refers to
+``singularity-image`` specified in the ``condor`` section of your configuration.
+
+If not using the reference Singularity container, say for local development, you
+can specify a path to a local container and use that for the workflow (non-OSG).
+
+To pull a container with gstlal installed, run:
+
+.. code:: bash
+
+    $ singularity build --sandbox --fix-perms <image-name> docker://containers.ligo.org/lscsoft/gstlal:master
+
+To use a branch other than master, you can replace `master` in the above command with the name of the desired branch. To use a custom build instead, gstlal will need to be installed into the container from your modified source code. For installation instructions, see the
+`installation page <https://docs.ligo.org/lscsoft/gstlal/installation.html>`_
+
+2. Set up workflow
+""""""""""""""""""""
+
+First, we create a new analysis directory and switch to it:
+
+.. code:: bash
+
+   $ mkdir <analysis-dir>
+   $ cd <analysis-dir>
+   $ mkdir bank mass_model idq dtdphi
+
+Default configuration files and environment (``env.sh``) for a
+variety of different banks are contained in the
+`offline-configuration <https://git.ligo.org/gstlal/offline-configuration>`_
+repository.
+One can run the commands below to grab the configuration files, or clone the 
+repository and copy the files as needed into the analysis directory. 
+To download data files (mass model, template banks) that may be needed for 
+offline runs, see the
+`README <https://git.ligo.org/gstlal/offline-configuration/-/blob/main/README.md>`_
+in the offline-configuration repo. Move the template bank(s) into ``bank`` and the mass model into ``mass_model``.
+
+
+For example, to grab all the relevant files for a small BNS dag:
+
+.. code:: bash
+
+    $ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/configs/bns-small/config.yml
+    $ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/env.sh
+    $ source /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/etc/profile.d/conda.sh
+    $ conda activate igwn
+    $ dcc archive --archive-dir=. --files -i T2200318-v2
+    $ conda deactivate
+
+
+Then move the template bank, mass model, idq file, and dtdphi file into their corresponding directories.
+
+
+When running an analysis on the ICDS cluster in the gstlalcbc shared account, 
+the contents of ``env.sh`` must be changed to what is given below.
+In addition, below in the tutorial, where it says to run ``ligo-proxy-init -p``, 
+instead, run ``source env.sh`` on the modified ``env.sh``. 
+When running on non gstlalcbc shared accounts on ICDS or when running on other 
+clusters, the ``env.sh`` does not need to be modifed, and ``ligo-proxy-init -p`` 
+can be run as in the tutorial. 
+
+.. code-block:: yaml
+   export PYTHONUNBUFFERED=1
+   unset X509_USER_PROXY
+   export X509_USER_CERT=/ligo/home/ligo.org/gstlalcbc/.cert/gstlalcbc_icds_robot.key.pem
+   export X509_USER_KEY=/ligo/home/ligo.org/gstlalcbc/.cert/gstlalcbc_icds_robot.key.pem
+   export GSTLAL_FIR_WHITEN=0
+
+
+Now, we'll need to modify the configuration as needed to run the analysis. At
+the very least, setting the start/end times and the instruments to run over:
+
+.. code-block:: yaml
+
+    start: 1187000000
+    stop: 1187100000
+
+    instruments: H1L1
+
+Ensure the template bank, mass model, idq file, and dtdphi file are pointed to in the configuration:
+
+.. code-block:: yaml
+
+    data: 
+      template-bank: bank/gstlal_bank_small.xml.gz
+
+.. code-block:: yaml
+
+    prior:
+      mass-model: bank/mass_model_small.h5
+      idq-timeseries: idq/H1L1-IDQ_TIMESERIES-1239641219-692847.h5
+      dtdphi: dtdphi/inspiral_dtdphi_pdf.h5
+
+If you're creating a summary page for results, you'll need to point at a
+location where they are web-viewable:
+
+.. code-block:: yaml
+
+    summary:
+      webdir: ~/public_html/
+
+If you're running on LIGO compute resources and your username doesn't match your
+albert.einstein username, you'll also additionally need to specify the
+accounting group user for condor to track accounting information:
+
+.. code-block:: yaml
+
+    condor:
+      accounting-group-user: albert.einstein
+
+In addition, update the ``singularity-image`` in the ``condor`` section of your configuration if needed:
+
+.. code-block:: yaml
+
+    condor:
+      singularity-image: /cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master
+
+If not using a reference Singularity image, you can replace this with the
+full path to a local singularity container ``<image>``.
+
+For more detailed configuration options, take a look at the :ref:`configuration
+section <analysis-configuration>` below.
+
+If you haven't installed site-specific profiles yet (per-user), you can run:
+
+.. code:: bash
+
+    $ singularity exec <image> gstlal_grid_profile install
+
+which will install configurations that are site-specific, i.e. ``ldas`` and ``icds``.
+You can select which profile to use in the ``condor`` section:
+
+.. code-block:: yaml
+
+    condor:
+      profile: ldas
+
+For a OSG IGWN grid run, use ``osg``.
+To view which profiles are available, you can run:
+
+.. code:: bash
+
+    $ singularity exec <image> gstlal_grid_profile list
+
+
+Note, you can install :ref:`custom profiles <install-custom-profiles>` as well.
+
+Once you have the configuration, data products, and grid profiles installed, you
+can set up the Makefile using the configuration, which we'll then use for
+everything else, including the data file needed for the workflow, the workflow
+itself, the summary page, etc.
+
+.. code:: bash
+
+    $ singularity exec <image> gstlal_inspiral_workflow init -c config.yml
+
+By default, this will generate the full workflow. If you want to only run the
+filtering step, a rerank, or an injection-only workflow, you can instead specify
+the workflow as well, e.g.
+
+.. code:: bash
+
+    $ singularity exec <image> gstlal_inspiral_workflow init -c config.yml -w injection
+
+for an injection-only workflow.
+
+If you already have a Makefile and need to update it based on an updated
+configuration, run ``gstlal_inspiral_workflow`` with ``--force``.
+
+Next, if you accessing non-public (GWOSC) data, you'll need to set up your proxy
+to ensure you can get access to LIGO data:
+
+.. code:: bash
+
+    $ X509_USER_PROXY=/path/to/x509_proxy ligo-proxy-init -p albert.einstein
+
+Note that we are running this step outside of Singularity. This is because ``ligo-proxy-init``
+is not installed within the image currently.
+If you are running on the ICDS gstlalcbc shared account, do not run the command 
+above. 
+Instead, run:
+
+.. code:: bash
+
+    $ source env.sh
+
+
+Also update the configuration accordingly (if needed):
+
+.. code-block:: yaml
+
+    source:
+      x509-proxy: /path/to/x509_proxy
+
+Finally, set up the rest of the workflow including the DAG for submission:
+
+.. code:: bash
+
+    $ singularity exec -B $TMPDIR <image> make dag
+
+If running on the OSG IGWN grid, make sure to submit the dags from the OSG node.
+This should create condor DAGs for the workflow. Mounting a temporary directory
+is important as some of the steps will leverage a temporary space to generate files.
+
+If one desires to see detailed error messages, add ``<PYTHONUNBUFFERED=1>`` to 
+``environment`` in the submit (``*.sub``) files by running:
+
+.. code:: bash
+
+    $ sed -i '/^environment = / s/\"$/ PYTHONUNBUFFERED=1\"/' *.sub
+
+
+3. Launch workflows
+"""""""""""""""""""""""""
+
+.. code:: bash
+
+    $ source env.sh
+    $ make launch
+
+This is simply a thin wrapper around `condor_submit_dag` launching the DAG in question.
+
+You can monitor the dag with Condor CLI tools such as ``condor_q`` and ``tail -f full_inspiral_dag.dag.dagman.out``.
+
+4. Generate Summary Page
+"""""""""""""""""""""""""
+
+After the DAG has completed, you can generate the summary page for the analysis:
+
+.. code:: bash
+
+    $ singularity exec <image> make summary
+
+To make an open-box page after this, run:
+
+.. code:: bash
+
+    $ make unlock
+
+.. _analysis-configuration:
+
+Configuration
+^^^^^^^^^^^^^^
+
+The top-level configuration consists of the analysis times and detector configuration:
+
+.. code-block:: yaml
+
+    start: 1187000000
+    stop: 1187100000
+
+    instruments: H1L1
+    min-instruments: 1
+
+These set the start and stop gps times of the analysis, plus the detectors to use
+(H1=Hanford, L1=Livingston, V1=Virgo). There is a nice online converter for gps times
+here: https://www.gw-openscience.org/gps/. You can also use the program `gpstime` as
+well. Note that these start and stop times have no knowledge about science
+quality data, the actual science quality data that are analyzed is typically a
+subset of the total time. Information about which detectors were on at different
+times is available here: https://www.gw-openscience.org/data/.
+
+``min-instruments`` sets the minimum number of instruments we will allow to form
+an event, e.g. setting it to 1 means the analysis will consider single detector
+events, 2 means we will only consider events that are coincident across at least
+2 detectors.
+
+Section: Data
+""""""""""""""
+
+.. code-block:: yaml
+
+    data:
+      template-bank: bank/gstlal_bank_small.xml.gz
+      analysis-dir: /path/to/analysis/dir
+
+The ``template-bank`` option points to the template bank file. These
+are xml files that follow the LIGOLW (LIGO light weight) schema. The template
+bank in particular contains a table that lists the parameters of all of the
+templates, it does not contain the actual waveforms themselves. Metadata such as
+the waveform approximant and the frequency cutoffs are also listed in this file.
+
+The ``analysis-dir`` option is used if the user wishes to point to an existing
+analysis to perform a rerank or an injection-only workflow. This grabs existing files
+from this directory to seed the rerank/injection workflows.
+
+One can use multiple sub template banks. In this case, the configuration might look like:
+
+.. code-block:: yaml
+
+    data:
+      template-bank: 
+        bns: bank/sub_bank/bns.xml.gz
+        nsbh: bank/sub_bank/nsbh.xml.gz
+        bbh_1: bank/sub_bank/bbh_low_q.xml.gz
+        bbh_2: bank/sub_bank/other_bbh.xml.gz
+        imbh: bank/sub_bank/imbh_low_q.xml.gz
+
+
+Section: Source
+""""""""""""""""
+
+.. code-block:: yaml
+
+    source:
+      data-source: frames
+      data-find-server: datafind.gw-openscience.org
+      frame-type:
+        H1: H1_GWOSC_O2_16KHZ_R1
+        L1: L1_GWOSC_O2_16KHZ_R1
+      channel-name:
+        H1: GWOSC-16KHZ_R1_STRAIN
+        L1: GWOSC-16KHZ_R1_STRAIN
+      sample-rate: 4096
+      frame-segments-file: segments.xml.gz
+      frame-segments-name: datasegments
+      x509-proxy: x509_proxy
+
+The ``data-find-server`` option points to a server that is queried to find the
+location of frame files. The address shown above is a publicly available server
+that will return the locations of public frame files on cvmfs. Each frame file
+has a type that describes the contents of the frame file, and may contain
+multiple channels of data, hence the channel names must also be specified.
+``frame-segments-file`` points to a LIGOLW xml file that describes the actual
+times to analyze, i.e. it lists the time that science quality data are
+available. These files are generalized enough that they could describe different
+types of data, so ``frame-segments-name`` is used to specify which segment to
+consider. In practice, the segments file we produce will only contain the
+segments we want. Users will typically not change any of these options once they
+are set for a given instrument and observing run. ``x509-proxy`` is the path to 
+your ``x509-proxy``. 
+
+Section: Segments
+""""""""""""""""""
+
+The ``segments`` section specifies how to generate segments and vetoes for the
+workflow. There are two backends to determine where to query segments and vetoes
+from, ``gwosc`` (public) and ``dqsegdb`` (authenticated).
+
+An example of configuration with the ``gwosc`` backend looks like:
+
+.. code-block:: yaml
+
+    segments:
+      backend: gwosc
+      vetoes:
+        category: CAT1
+
+Here, the ``backend`` is set to ``gwosc`` so both segments and vetoes are determined
+by querying the GWOSC server. There is no additional configuration needed to query
+segments, but for vetoes, we also need to specify the ``category`` used for vetoes.
+This can be one of ``CAT1``, ``CAT2``, or ``CAT3``. By default, segments are generated
+by applying ``CAT1`` vetoes as recommended by the Detector Characterization group.
+
+An example of configuration with the ``dqsegdb`` backend looks like:
+
+.. code-block:: yaml
+
+    segments:
+      backend: dqsegdb
+      science:
+        H1: DCS-ANALYSIS_READY_C01:1
+        L1: DCS-ANALYSIS_READY_C01:1
+        V1: ITF_SCIENCE:2
+      vetoes:
+        category: CAT1
+        veto-definer:
+          file: H1L1V1-HOFT_C01_V1ONLINE_O3_CBC.xml
+          version: O3b_CBC_H1L1V1_C01_v1.2
+          epoch: O3
+
+Here, the ``backend`` is set to ``dqsegdb`` so both segments and vetoes are determined
+by querying the DQSEGDB server. To query segments, one needs to specify the flag used
+per instrument to query segments from. For vetoes, we need to specify the ``category``
+used for vetoes as with the ``dqsegdb`` backend. Additionally, a veto definer file is
+used to determine which flags are used for which veto categories. The file need not be
+provided, the ``file``, ``version`` and ``epoch`` fully specify how to access the veto
+definer file used for generating vetoes.
+
+Section: PSD
+""""""""""""""
+
+.. code-block:: yaml
+
+    psd:
+      fft-length: 8
+      sample-rate: 4096
+
+The PSD estimation method used by GstLAL is a modified median-Welch method that
+is described in detail in Section IIB of Ref [1]. The FFT length sets the length
+of each section that is Fourier transformed. The default whitener will use
+zero-padding of one-fourth the FFT length on either side and will overlap
+fourier transformed segments by one-fourth the FFT length. For example, an
+``fft-length`` of 8 means that each Fourier transformed segment used in the PSD
+estimation (and consequently the whitener) will contain 4 seconds of data with 2
+seconds of zero padding on either side, and will overlap the next segment by 2
+seconds (i.e. the last two seconds of data in one segment will be the first two
+seconds of data in the following window).
+
+Section: SVD
+""""""""""""""
+
+.. code-block:: yaml
+
+    svd:
+      f-low: 20.0
+      num-chi-bins: 1
+      sort-by: mchirp
+      approximant:
+        - 0:1.73:TaylorF2
+        - 1.73:1000:SEOBNRv4_ROM
+      tolerance: 0.9999
+      max-f-final: 1024.0
+      num-split-templates: 200
+      overlap: 30
+      num-banks: 5
+      samples-min: 2048
+      samples-max-64: 2048
+      samples-max-256: 2048
+      samples-max: 4096
+      autocorrelation-length: 701
+      max-duration: 128
+      manifest: svd_manifest.json
+
+``f-low`` sets the lower frequency cutoff for the analysis in Hz. 
+
+``num-chi-bins`` is a tunable parameter related to the template bank binning
+procedure; specifically, sets the number of effective spin parameter bins to use
+in the chirp-mass / effective spin binning procedure described in Sec. IID and
+Fig. 6 of [1].
+
+``sort-by`` selects the template sort column. This controls how to bin the 
+bank in sub-banks suitable for the svd decomposition. It can be ``mchirp`` 
+(sorts by chirp mass), ``mu`` (sorts by mu1 and mu2 coordiantes), or 
+``template_duration`` (sorts by template duration). 
+
+``approximant`` specifies the waveform approximant that should be used along
+with chirp mass bounds to use that approximant in. 0:1000:TaylorF2 means use the
+TaylorF2 approximant for waveforms from systems with chirp-masses between 0 and
+1000 solar masses. Multiple waveforms and chirp-mass bounds can be provided.
+
+``tolerance`` is a tunable parameter related to the truncation of SVD basis
+vectors. A tolerance of 0.9999 means the targeted matched-filter inner-product
+of the original waveform and the waveform reconstructed from the SVD is 0.9999.
+
+``max-f-final`` sets the max frequency of the template.
+
+``num-split-templates``, ``overlap``, ``num-banks``, are tunable parameters
+related to the SVD process. ``num-split-templates`` sets the number of templates
+to decompose at a time; ``overlap`` sets the number of templates from adjacent
+template bank regions to pad to the region being considered in order to actually
+compute the SVD (this helps the performance of the SVD, and these pad templates
+are not reconstructed); ``num-banks`` sets the number of sets of decomposed
+templates to include in a given bin for the analysis. For example,
+``num-split-templates`` of 200, ``overlap`` of 30, and ``num-banks`` of 5 means
+that each SVD bank file will contain 5 decomposed sets of 200 templates, where
+the SVD was computed using an additional 15 templates on either side of the 200
+(as defined by the binning procedure). 
+
+``samples-min``, ``samples-max-64``, ``samples-max-256``, and ``samples-max``
+are tunable parameters related to the template time slicing procedure used by
+GstLAL (described in Sec. IID and Fig. 7 of Ref. [1], and references therein).
+Templates are slice in time before the SVD is applied, and only sampled at the
+rate necessary for the highest frequency in each time slice (rounded up to a
+power of 2). For example, the low frequency part of a waveform may only be
+sampled at 32 Hz, while the high frequency part may be sampled at 2048 Hz
+(depending on user settings). ``samples-min`` sets the minimum number of samples
+to use in any time slice. ``samples-max`` sets the maximum number of samples to
+use in any time slice with a sample rate below 64 Hz; ``samples-max-64`` sets
+the maximum number of samples to use in any time slice with sample rates between
+64 Hz and 256 Hz; ``samples-max-256`` sets the maximum number of samples to use
+in any time slice with a sample rate greater than 256 Hz.
+
+``autocorrelation-length`` sets the number of samples to use when computing the
+autocorrelation-based test-statistic, described in IIIC of Ref [1].
+
+``max-duration`` sets the maximum template duration in seconds. One can choose 
+not to use ``max-duration``. 
+
+``manifest`` sets the name of a file that will contain metadata about the
+template bank bins.
+
+If one uses multiple sub template banks, SVD configurations can be specified 
+for each sub template bank. Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
+
+Users will typically not change these options.
+
+Section: Filter
+""""""""""""""""
+
+.. code-block:: yaml
+
+    filter:
+      fir-stride: 1
+      min-instruments: 1
+      coincidence-threshold: 0.01
+      ht-gate-threshold: 0.8:15.0-45.0:100.0
+      veto-segments-file: vetoes.xml.gz
+      time-slide-file: tisi.xml
+      injection-time-slide-file: inj_tisi.xml
+      time-slides:
+        H1: 0:0:0
+        L1: 0.62831:0.62831:0.62831
+      injections:
+        bns:
+          file: bns_injections.xml
+          range: 0.01:1000.0
+
+``fir-stride`` is a tunable parameter related to the matched-filter procedure,
+setting the length in seconds of the output of the matched-filter element.
+
+``coincidence-threshold`` is the time in seconds to add to the light-travel time
+when searching for coincidences between detectors.
+
+``ht-gate-threshold`` sets the h(t) gate threshold as a function of chirp-mass.
+The h(t) gate threshold is a value over which the output of the whitener plus
+some padding will be set to zero (as described in IIC of Ref. [1]).
+0.8:15.0-45.0:100.0 mean that a template bank bin that that has a max chirp-mass
+template of 0.8 solar masses will use a gate threshold of 15, a bank bin with a
+max chirp-mass of 100 will use a threshold of 45, and all other thresholds are
+described by a linear function between those two points.
+
+``veto-segments-file`` sets the name of a LIGOLW xml file that contains any
+vetoes used for the analysis, even if there are no vetoes.
+
+``time-slide-file`` and ``inj-time-slide-file`` are LIGOLW xml files that
+describe any time slides used in the analysis. A typical analysis will only
+analyze injections with the zerolag “time slide” (i.e. the data are not slid in
+time), and will consider the zerolag and one other time slide for the
+non-injection analysis. The time slide is used to perform a blind sanity check
+of the noise model.
+
+injections will list a set of injections, each with their own label. In this
+example, there is only one injection set, and it is labeled “bns”. file is a
+relative path to the injection file (a LIGOLW xml file that contains the
+parameters of the injections, but not the actual waveforms themselves). range
+sets the chirp-mass range that should be considered when searching for this
+particular set of injections. Multiple injection files can be provided, each
+with their own label, file, and range. 
+
+The only option here that a user will normally interact with is the injections
+option. 
+
+When using multiple sub template banks, replace ``bns:`` under ``injections:`` 
+with ``inj:``
+
+
+Section: Injections
+""""""""""""""""""""
+
+.. code-block:: yaml
+
+    injections:
+      sets:
+        expected-snr:
+          f-low: 15.0
+        bns:
+          f-low: 14.0
+          seed: 72338
+          time:
+            step: 32
+            interval: 1
+            shift: 0
+          waveform: SpinTaylorT4threePointFivePN
+          mass-distr: componentMass
+          mass1:
+            min: 1.1
+            max: 2.8
+          mass2:
+            min: 1.1
+            max: 2.8
+          spin1:
+            min: 0
+            max: 0.05
+          spin2:
+            min: 0
+            max: 0.05
+          distance:
+            min: 10000
+            max: 80000
+          spin-aligned: True
+          file: bns_injections.xml
+
+The ``sets`` subsection is used to create injection sets to be used within the
+analysis, and referenced to by name in the ``filter`` section. In ``sets``, the
+injections are grouped by key. In this case, one ``bns`` injection set which
+creates the ``bns_injections.xml`` file and used in the ``injections`` section
+of the ``filter`` section.
+
+For multiple injections, the chunk for ``bns:`` should be repeated for each 
+injection. Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
+
+Besides creating injection sets, the ``expected-snr`` subsection is used for the
+expected SNR jobs. These settings are used to override defaults as needed.
+
+``spin-aligned`` specifies whether the injections should be spin-(mis)aligned 
+spins (if ``spin-aligned: True``) or precessing spins (if ``spin-aligned: False``).
+
+In the case of multiple injection sets that need to be combined, one can add
+a few options to create a combined file and reference that within the filter
+jobs. This can be useful for large banks with a large set of templates. To
+do this, one can add the following:
+
+.. code-block:: yaml
+
+    injections:
+      combine: true
+      combined-file: combined_injections.xml
+
+The injections created are generated from the ``lalapps_inspinj`` program, with
+the following mapping between configuration and command line options:
+
+* ``f-low``: ``--f-lower``
+* ``seed``: ``--seed``
+* ``time`` section: ``-time-step``, ``--time-interval``. ``shift`` adjusts the
+  start time appropriately.
+* ``waveform``: ``--waveform``
+* ``mass-distr``: ``--m-distr``
+* ``mass/spin/distance`` sections: maps to options like ``--min-mass1``
+
+Section: Prior
+""""""""""""""""
+
+.. code-block:: yaml
+
+    prior:
+      mass-model: mass_model/mass_model_small.h5
+
+``mass-model`` is a relative path to the file that contains the mass model. This
+model is used to weight templates appropriately when assigning ranking
+statistics based on our understanding of the astrophysical distribution of
+signals. Users will not typically change this option.
+
+An optional ``dtdphi-file`` and ``idq-timeseries`` can be provided here. If not 
+given, a default model (included in the standard installation) will be used. 
+The dtdph file will specify a probability distribution function for the 
+probability of measuring a given time shift and phase shift in mulitple detector 
+observation. It enters in the ranking statistics.
+The idq file will give information about the data quality around the time of 
+coalescence. 
+If specifying idq files and dtdphi files, create a directory for idq and dtdphi 
+each in the ``<analysis-dir>``, and put the idq files and dtdphi files in the 
+respective directory. 
+Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
+
+Section: Rank
+""""""""""""""""
+
+.. code-block:: yaml
+
+    rank:
+      ranking-stat-samples: 4194304
+
+``ranking-stat-samples`` sets the number of samples to draw from the noise model
+when computing the distribution of log likelihood-ratios (the ranking statistic)
+under the noise hypothesis. Users will not typically change this option.
+
+Section: Summary
+""""""""""""""""""
+
+.. code-block:: yaml
+
+    summary:
+      webdir: /path/to/public_html/folder
+
+``webdir`` sets the path of the output results webpages produced by the
+analysis. Users will typically change this option for each analysis.
+
+Section: Condor
+""""""""""""""""""
+
+.. code-block:: yaml
+
+    condor:
+      profile: osg-public
+      accounting-group: ligo.dev.o3.cbc.uber.gstlaloffline
+      accounting-group-user: <albert.einstein>
+      singularity-image: <image>
+
+``profile`` sets a base level of configuration options for condor.
+
+``accounting-group`` sets accounting group details on LDG resources. Currently
+the machinery to produce an analysis dag requires this option, but the option is
+not actually used by analyses running on non-LDG resources.
+
+``singularity-image`` sets the path of the container on cvmfs that the analysis
+should use. Users will not typically change this option 
+(use ``/cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master``).
+
+.. _install-custom-profiles:
+
+Installing Custom Site Profiles
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+You can define a site profile as YAML. As an example, we can create a file called ``custom.yml``:
+
+.. code-block:: yaml
+
+    scheduler: condor
+    requirements:
+      - "(IS_GLIDEIN=?=True)"
+
+Both the directives and requirements sections are optional.
+
+To install one so it's available for use, run:
+
+.. code:: bash
+
+    $ singularity exec <image> gstlal_grid_profile install custom.yml
--- a/doc/source/conf.py
+++ b/doc/source/conf.py
@@ -22,9 +22,11 @@ sys.path.insert(0, os.path.abspath('.'))
 sys.path.insert(0, os.path.abspath('../../gstlal/python'))
 sys.path.insert(0, os.path.abspath('../../gstlal-inspiral/python'))
 sys.path.insert(0, os.path.abspath('../../gstlal-burst/python'))
-sys.path.insert(0, os.path.abspath('../../gstlal-calibration/python'))
 sys.path.insert(0, os.path.abspath('../../gstlal-ugly/python'))

+#  on_rtd is whether we are on readthedocs.org, this line of code grabbed
+#  from docs.readthedocs.org
+on_rtd = os.environ.get('READTHEDOCS', None) == 'True'

 # -- General configuration ------------------------------------------------

@@ -35,16 +37,25 @@ sys.path.insert(0, os.path.abspath('../../gstlal-ugly/python'))
 # Add any Sphinx extension module names here, as strings. They can be
 # extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
 # ones.
-extensions = ['sphinx.ext.autodoc',
+extensions = [
+    'sphinx.ext.autodoc',
    'sphinx.ext.autosummary',
    'sphinx.ext.intersphinx',
    'sphinx.ext.todo',
    'sphinx.ext.coverage',
-    'sphinx.ext.imgmath',
+    # 'sphinx.ext.imgmath',
    'sphinx.ext.ifconfig',
    'sphinx.ext.viewcode',
    'sphinx.ext.githubpages',
-    'sphinx.ext.graphviz']
+    'sphinx.ext.graphviz',
+    'sphinx.ext.mathjax',
+	'myst_parser',
+]
+
+myst_enable_extensions = [
+    "amsmath",
+    "dollarmath",
+]

 # Add any paths that contain templates here, relative to this directory.
 templates_path = ['_templates']
@@ -52,8 +63,8 @@ templates_path = ['_templates']
 # The suffix(es) of source filenames.
 # You can specify multiple suffix as a list of string:
 #
-# source_suffix = ['.rst', '.md']
-source_suffix = '.rst'
+source_suffix = ['.rst', '.md']
+# source_suffix = '.rst'

 # The master toctree document.
 master_doc = 'index'
@@ -61,7 +72,7 @@ master_doc = 'index'
 # General information about the project.
 # FIXME get from autotools
 project = u'GstLAL'
-copyright = u'2018, GstLAL developers'
+copyright = u'2021, GstLAL developers'
 author = u'GstLAL developers'

 # The version info for the project you're documenting, acts as replacement for
@@ -69,10 +80,9 @@ author = u'GstLAL developers'
 # built documents.
 #
 # The short X.Y version.
-# FIXME get from autotools
-version = u'1.x'
+#version = u'1.x'
 # The full version, including alpha/beta/rc tags.
-release = u'1.x'
+release = ''

 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
@@ -98,28 +108,32 @@ todo_include_todos = True
 # The theme to use for HTML and HTML Help pages.  See the documentation for
 # a list of builtin themes.
 #
-html_theme = 'alabaster'#'classic'
-html_logo = "gstlal_small.png"
+html_theme = 'default'

 # Theme options are theme-specific and customize the look and feel of a theme
 # further.  For a list of options available for each theme, see the
 # documentation.
-html_theme_options = {
-    'fixed_sidebar': 'true',
-    'sidebar_width': '200px',
-    'page_width': '95%',
-    'show_powered_by': 'false',
-    'logo_name': 'true',
-}
+#html_theme_options = {}
+
+def setup(app):
+    app.add_stylesheet('css/my_theme.css')

 # Add any paths that contain custom static files (such as style sheets) here,
 # relative to this directory. They are copied after the builtin static files,
 # so a file named "default.css" will overwrite the builtin "default.css".
 html_static_path = ['_static']

+if not on_rtd:  # only import and set the theme if we're building docs locally
+    import sphinx_rtd_theme
+    html_theme = 'sphinx_rtd_theme'
+    html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
+
 # Custom sidebar templates, maps document names to template names.
-html_sidebars = { '**': ['navigation.html', 'relations.html', 'searchbox.html'] }
-html_last_updated_fmt = None
+#html_sidebars = { '**': ['navigation.html', 'relations.html', 'searchbox.html'] }
+#html_last_updated_fmt = None
+
+# Add a favicon to doc pages
+html_favicon = '_static/favicon.ico'

 # -- Options for HTMLHelp output ------------------------------------------


--- a/doc/source/container_environment.md
+++ b/doc/source/container_environment.md
+# Container Development Environment
+
+The container development workflow consists of a few key points:
+
+- Build tools provided by and used within a writable gstlal container.
+- Editor/git used in or outside of the container as desired.
+- Applications are run in the development container.
+
+The benefits of developing in a writable container:
+
+- Your builds do not depend on the software installed on the system, you don't have to worry about behavior changes due to system package updates.
+- Your build environment is the same as that of everyone else using the same base container. This makes for easier collaboration. 
+- Others can run your containers and get the same results. You don't have to worry about environment mis-matches.
+
+
+
+## Create a writable container
+
+The base of a development environment is a gstlal container. It is typical to start with the
+current master build. However, you can use the build tools to overwite the install in the container so the 
+choice of branch in your gstlal repository matters more than the container that you start with. The job of
+the container is to provide a well-defined set of dependencies.
+
+```bash
+    singularity build --sandbox --fix-perms CONTAINER_NAME docker://containers.ligo.org/lscsoft/gstlal:master
+```
+
+This will creat a directory named CONTAINER_NAME. That directory is a *singularity container*.
+
+## Check out gstlal
+
+In a directory of your choice, under your home directory, run:
+
+```
+    git clone https://git.ligo.org/lscsoft/gstlal  DIRNAME
+```
+
+This will create a git directory named DIRNAME which is referred to in the following as your "gstlal dir". The gstlal dir
+contains several directories that contain components that can be built independently (e.g., `gstlal`, `gstlal-inspiral`, `gstlal-ugly`, ...).
+
+A common practice is to run the clone command in the CONTAINER_NAME directory and use `src` as `DIRNAME`. In this case, when you run your
+container, your source will be available in the directory `/src`.
+
+
+## Develop
+
+Edit and make changes under your gstlal dir using editors and git outside of the container (or inside if you prefer).
+
+## Build a component
+
+To build a component:
+
+1. cd to your gstlal directory
+2. Run your container:
+```
+   singularity run --writable -B $TMPDIR CONTAINER_NAME /bin/bash
+```  
+3. cd to the component directory under your gstlal dir.
+4. Initialize the build system for your component. You only need to do this once per container per component directory:
+```
+	./00init.sh
+	./configure --prefix=/usr --libdir=/usr/lib64
+```
+The arguments to configure are required so that you overwrite the build of gstlal in your container.
+
+Some components have dependencies on others. You should build GstLAL components in the following order:
+
+1. `gstlal`
+2. `gstlal-ugly`
+3. `gstlal-inspiral`, `gstlal-burst`, `gstlal-calibrarion` (in any order)
+
+For example, if you want to build `gstlal-ugly`, you should build `gstlal` first.
+
+5. Run make and make install
+```
+     make
+     make install
+```
+
+Note that the container is writable, so your installs will persist after you exit the container and run it again.
+
+## Run your code
+
+You can run your code in the following ways:
+
+1. Run your container using singularity and issue commands interactively "inside the container":
+    ```
+    singularity run --writable -B $TMPDIR PATH_TO_CONTAINER  /bin/bash
+    /bin/gstlal_reference_psd  --channel-name=H1=foo --data-source=white  --write-psd=out.psd.xml --gps-start-time=1185493488 --gps-end-time=1185493788
+    ```   
+
+2. Use `singularity exec` and give your command on the singularity command line:
+    ```
+    singularity exec --writable -B $TMPDIR PATH_TO_CONTAINER  /bin/gstlal_reference_psd  --channel-name=H1=foo --data-source=white  --write-psd=out.psd.xml --gps-start-time=1185493488 --gps-end-time=1185493788
+    ```
+
+3. Use your container in a new or existing [container-based gstlal workflow](/gstlal/cbc_analysis.html) on a cluster with a shared filesystem where your container resides. For example, you can run on the CIT cluster or on the PSU cluster, but not via the OSG (you can run your container as long as your container is available on the shared filesystem of the cluster where you want to run). In order to run your code on the OSG, you would have to arrange to have your container published to cvmfs.
--- a/doc/source/contributing.md
+++ b/doc/source/contributing.md
+# Contributing Workflow
+
+## Git Branching
+
+The `gstlal` team uses the standard git-branch-and-merge workflow, which has brief description
+at [GitLab](https://docs.gitlab.com/ee/gitlab-basics/feature_branch_workflow.html) and a full description
+at [BitBucket](https://www.atlassian.com/git/tutorials/comparing-workflows/feature-branch-workflow). As depicted below,
+the workflow involves the creation of new branches for changes, the review of those branches through the Merge Request
+process, and then the merging of the new changes into the main branch.
+
+![git-flow](_static/img/git-flow.png)
+
+### Git Workflow
+
+In general the steps for working with feature branches are:
+
+1. Create a new branch from master: `git checkout -b feature-short-desc`
+1. Edit code (and tests)
+1. Commit changes: `git commit . -m "comment"`
+1. Push branch: `git push origin feature-short-desc`
+1. Create merge request on GitLab
+
+## Merge Requests
+
+### Creating a Merge Request
+
+Once you push feature branch, GitLab will prompt on gstlal repo [home page](). Click “Create Merge Request”, or you can
+also go to the branches page (Repository > Branches) and select “Merge Request” next to your branch.
+
+![mr-create](_static/img/mr-create.png)
+
+When creating a merge request:
+
+1. Add short, descriptive title
+1. Add description
+    - (Uses markdown .md-file style)
+    - Summary of additions / changes
+    - Describe any tests run (other than CI)
+1. Click “Create Merge Request”
+
+![mr-create](_static/img/mr-create-steps.png)
+
+### Collaborating on merge requests
+
+The Overview page give a general summary of the merge request, including:
+
+1. Link to other page to view changes in detail (read below)
+1. Code Review Request
+1. Test Suite Status
+1. Discussion History
+1. Commenting
+
+![mr-overview](_static/img/mr-overview.png)
+
+#### Leaving a Review
+
+The View Changes page gives a detailed look at the changes made on the feature branch, including:
+
+1. List of files changed
+1. Changes
+    - Red = removed
+    - Green = added
+1. Click to leave comment on line
+1. Choose “Start a review”
+
+![mr-changes](_static/img/mr-changes.png)
+
+After review started:
+
+1. comment pending
+1. Submit review
+
+![mr-changes](_static/img/mr-change-submit.png)
+
+#### Responding to Reviews
+
+Reply to code review comments as needed Use “Start a review” to submit all replies at once
+
+![mr-changes](_static/img/mr-respond.png)
+
+Resolve threads when discussion on a particular piece of code is complete
+
+![mr-changes](_static/img/mr-resolve.png)
+
+### Merging the Merge Request
+
+Merging:
+
+1. Check all tests passed
+1. Check all review comments resolved
+1. Check at least one review approval
+1. Before clicking “Merge”
+    - Check “Delete source branch”
+    - Check “Squash commits” if branch history not tidy
+1. Click “Merge”
+1. Celebrate
+
+![mr-merge](_static/img/mr-merge.png)
+
+
--- a/doc/source/contributing_docs.md
+++ b/doc/source/contributing_docs.md
+# Contributing Documentation
+
+This guide assumes the reader has read the [Contribution workflow](contributing.md) for details about making changes to
+code within gstlal repo, since the documentation files are updated by a similar workflow.
+
+## Writing Documentation
+
+In general, the gstlal documentation uses [RestructuredText (rst)](https://docutils.sourceforge.io/rst.html) files
+ending in `.rst` or [Markdown](https://www.markdownguide.org/basic-syntax/) files ending in `.md`.
+
+The documentation files for gstlal are located under `gstlal/doc/source`. If you add a new page (doc file), make sure to
+reference it from the main index page.
+
+Useful Links:
+
+- [MyST Directive Syntax](https://myst-parser.readthedocs.io/en/latest/syntax/syntax.html#syntax-directives)
+
--- a/doc/source/executables.rst
+++ b/doc/source/executables.rst
+Executables
+===============
+
+.. toctree::
+   :maxdepth: 2
+
+   gstlal/bin/bin
+   gstlal-inspiral/bin/bin
+   gstlal-burst/bin/bin
+   gstlal-ugly/bin/bin
--- a/doc/source/extrinsic_parameters_generation.rst
+++ b/doc/source/extrinsic_parameters_generation.rst
+.. _extrinsic-parameters-generation:
+
+Generating Extrinsic Parameter Distributions
+============================================
+
+This tutorial will show you how to regenerate the extrinsic parameter
+distributions used to determine the likelihood ratio term that accounts for the
+relative times-of-arrival, phases, and amplitudes of a CBC signal at each of
+the LVK detectors.
+
+There are two parts described below that represents different terms.  Full
+documentation can be found here:
+https://lscsoft.docs.ligo.org/gstlal/gstlal-inspiral/python-modules/stats.inspiral_extrinsics.html
+
+Setting up the dt, dphi, dsnr dag
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+1. Setup a work area and obtain the necessary input files
+"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+
+You will need to create a directory on a cluster running HTCondor, e.g.,
+
+.. code:: bash
+
+    $ mkdir dt_dphi_dsnr
+    $ cd dt_dphi_dsnr
+
+This workflow requires estimates of the power spectral densities for LIGO,
+Virgo and KAGRA.  For this tutorial we use projected O4 sensitivities in the
+LIGO DCC. You can feel free to substitute these to suit your needs.
+
+We will use the following files found at: https://dcc.ligo.org/LIGO-T2000012 ::
+
+    aligo_O4high.txt
+    avirgo_O4high_NEW.txt
+    kagra_3Mpc.txt
+
+Download the above files and place them in the dt_dphi_dsnr directory that you are currently in.
+
+
+2.  Excecute commands to generate the HTCondorDAG
+"""""""""""""""""""""""""""""""""""""""""""""""""
+
+For this tutorial, we assume that you have a singularity container with the
+gstlal software.  More details can be found here:
+https://lscsoft.docs.ligo.org/gstlal/installation.html
+
+The following Makefile illustrates the sequence of commands required to generate an HTCondor workflow.  You can copy this into a file called ``Makefile`` and modify it as you wish.
+
+.. code:: make
+
+    SINGULARITY_IMAGE=/ligo/home/ligo.org/chad.hanna/development/gstlal-dev/
+    sexec=singularity exec $(SINGULARITY_IMAGE)
+    
+    all: dt_dphi.dag
+    
+    # 417.6 Mpc Horizon
+    H1_aligo_O4high_psd.xml.gz: aligo_O4high.txt
+    	$(sexec) gstlal_psd_xml_from_asd_txt --instrument=H1 --output $@ $<
+    
+    # 417.6 Mpc Horizon
+    L1_aligo_O4high_psd.xml.gz: aligo_O4high.txt
+    	$(sexec) gstlal_psd_xml_from_asd_txt --instrument=L1 --output $@ $<
+    
+    # 265.8 Mpc Horizon
+    V1_avirgo_O4high_NEW_psd.xml.gz: avirgo_O4high_NEW.txt
+    	$(sexec) gstlal_psd_xml_from_asd_txt --instrument=V1 --output $@ $<
+    
+    # 6.16 Mpc Horizon
+    K1_kagra_3Mpc_psd.xml.gz: kagra_3Mpc.txt
+    	$(sexec) gstlal_psd_xml_from_asd_txt --instrument=K1 --output $@ $<
+    
+    O4_projected_psds.xml.gz: H1_aligo_O4high_psd.xml.gz L1_aligo_O4high_psd.xml.gz V1_avirgo_O4high_NEW_psd.xml.gz K1_kagra_3Mpc_psd.xml.gz
+    	$(sexec) ligolw_add --output $@ $^
+    
+    # SNR ratios according to horizon ratios
+    dt_dphi.dag: O4_projected_psds.xml.gz 
+    	$(sexec) gstlal_inspiral_create_dt_dphi_snr_ratio_pdfs_dag \
+    		--psd-xml $< \
+    		--H-snr 8.00 \
+    		--L-snr 8.00 \
+    		--V-snr 5.09 \
+    		--K-snr 0.12 \
+    		--m1 1.4 \
+    		--m2 1.4 \
+    		--s1 0.0 \
+    		--s2 0.0 \
+    		--flow 15.0 \
+    		--fhigh 1024.0 \
+    		--NSIDE 16 \
+    		--n-inc-angle 33 \
+    		--n-pol-angle 33 \
+    		--singularity-image $(SINGULARITY_IMAGE)
+    
+    clean:
+    	rm -rf H1_aligo_O4high_psd.xml.gz L1_aligo_O4high_psd.xml.gz V1_avirgo_O4high_NEW_psd.xml.gz logs dt_dphi.dag gstlal_inspiral_compute_dtdphideff_cov_matrix.sub gstlal_inspiral_create_dt_dphi_snr_ratio_pdfs.sub gstlal_inspiral_add_dt_dphi_snr_ratio_pdfs.sub dt_dphi.sh
+
+3.  Submit the HTCondor DAG and monitor the output
+""""""""""""""""""""""""""""""""""""""""""""""""""
+
+Next run make to generate the HTCondor DAG
+
+.. code:: bash
+    $ make
+
+Then submit the DAG
+
+.. code:: bash
+    $ condor_submit_dag dt_dphi.dag
+
+You can check the DAG progress by doing
+
+.. code:: bash
+    $ tail -f dt_dphi.dag.dagman.out
+
+
+4.  Test the output
+"""""""""""""""""""
+
+When the DAG completes successfully, you should have a file called ``inspiral_dtdphi_pdf.h5``.  You can verify that this file works with a python terminal, e.g.,
+
+.. code:: bash
+    $ singularity exec /ligo/home/ligo.org/chad.hanna/development/gstlal-dev/ python3
+    Python 3.6.8 (default, Nov 10 2020, 07:30:01) 
+    [GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
+    Type "help", "copyright", "credits" or "license" for more information.
+    >>> from gstlal.stats.inspiral_extrinsics import InspiralExtrinsics
+    >>> IE = InspiralExtrinsics(filename='inspiral_dtdphi_pdf.h5')
+    >>> 
+    
+Setting up probability of instrument combinations dag
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+1. Setup a work area and obtain the necessary input files
+"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
+
+You will need to create a directory on a cluster running HTCondor, e.g.,
+
+.. code:: bash
+
+    $ mkdir p_of_instruments
+    $ cd p_of_instruments
+
+
+2.  Excecute commands to generate the HTCondorDAG
+"""""""""""""""""""""""""""""""""""""""""""""""""
+
+Below is a sample Makefile that will work if you are using singularity
+
+
+3.  Submit the HTCondor DAG
+"""""""""""""""""""""""""""
+
+.. code:: bash
+    $ condor_submit_dag p_of_I_H1K1L1V1.dag 
+
+.. code:: make
+    SINGULARITY_IMAGE=/ligo/home/ligo.org/chad.hanna/development/gstlal-dev/
+    sexec=singularity exec $(SINGULARITY_IMAGE)
+    
+    all:
+            $(sexec) gstlal_inspiral_create_p_of_ifos_given_horizon_dag --instrument=H1 --instrument=L1 --instrument=V1 --instrument=K1 --singularity-image $(SINGULARITY_IMAGE)
+    
+    clean:
+            rm -rf gstlal_inspiral_add_p_of_ifos_given_horizon.sub  gstlal_inspiral_create_p_of_ifos_given_horizon.sub  logs p_of_I_H1K1L1V1.dag  p_of_I_H1K1L1V1.sh
+
+See Also
+^^^^^^^^
+
+* https://arxiv.org/abs/1901.02227
+* https://lscsoft.docs.ligo.org/gstlal/gstlal-inspiral/python-modules/stats.inspiral_extrinsics.html
+* https://lscsoft.docs.ligo.org/gstlal/gstlal-inspiral/bin/gstlal_inspiral_create_dt_dphi_snr_ratio_pdfs.html
+* https://lscsoft.docs.ligo.org/gstlal/gstlal-inspiral/bin/gstlal_inspiral_create_dt_dphi_snr_ratio_pdfs_dag.html
+* https://lscsoft.docs.ligo.org/gstlal/gstlal-inspiral/bin/gstlal_inspiral_compute_dtdphideff_cov_matrix.html
+ 
--- a/doc/source/gstlal-burst/overview.rst
+++ b/doc/source/gstlal-burst/overview.rst
-####################################################################################################
-Overview
-####################################################################################################
-
-.. _burst-overview-feature_extraction:
+.. _feature_extraction:

 Feature Extraction
 ====================================================================================================

-The `fxtools` module and related feature-based executables contain relevant libraries to identify
-glitches in low-latency using auxiliary channel data.
+SNAX (Signal-based Noise Acquisition and eXtraction), the `snax` module and related SNAX executables
+contain relevant libraries to identify glitches in low-latency using auxiliary channel data.

-`gstlal_feature_extractor` functions as a modeled search for data quality by applying matched filtering
+SNAX functions as a modeled search for data quality by applying matched filtering
 on auxiliary channel timeseries using waveforms that model a large number of glitch classes. Its primary
 purpose is to whiten incoming auxiliary channels and extract relevant features in low-latency.

-There are two different modes of output `gstlal_feature_extractor` can function in:
+.. _feature_extraction-intro:
+
+Introduction
+------------
+
+There are two different modes of feature generation:
+
+  1. **Timeseries:**

-  1. **Timeseries:** Production of regularly-spaced feature rows, containing the SNR, waveform parameters,
-                     and the time of the loudest event in a sampling time interval.
-  2. **ETG:** This produces output that resembles that of a traditional event trigger generator (ETG), in
-              which only feature rows above an SNR threshold will be produced.
+     Production of regularly-spaced feature rows, containing the SNR, waveform parameters,
+     and the time of the loudest event in a sampling time interval.
+
+  2. **ETG:**
+
+     This produces output that resembles that of a traditional event trigger generator (ETG), in
+     which only feature rows above an SNR threshold will be produced.

 One useful feature in using a matched filter approach to detect glitches is the ability to switch between
-different glitch templates or generate a heterogeneous bank of templates.. Currently, there are Sine-Gaussian
-and half-Sine-Gaussian waveforms implemented for use in detecting glitches, but the feature extractor was
-designed to be fairly modular and so it isn't difficult to design and add new waveforms for use.
+different glitch templates or generate a heterogeneous bank of templates. Currently, there are Sine-Gaussian,
+half-Sine-Gaussian, and tapered Sine-Gaussian waveforms implemented for use in detecting glitches, but the feature
+extractor is designed to be fairly modular and so it isn't difficult to design and add new waveforms for use.

-Since the GstLAL feature extractor uses time-domain convolution to matched filter auxiliary channel timeseries
+Since SNAX uses time-domain convolution to matched filter auxiliary channel timeseries
 with glitch waveforms, this allows latencies to be much lower than in traditional ETGs. The latency upon writing
 features to disk are O(5 s) in the current layout when using waveforms where the peak occurs at the edge of the
 template (zero-latency templates). Otherwise, there is extra latency incurred due to the non-causal nature of
@@ -36,7 +42,7 @@ the waveform itself.

    digraph llpipe {
     labeljust = "r";
-     label="gstlal_feature_extractor"
+     label="gstlal_snax_extract"
     rankdir=LR;
     graph [fontname="Roman", fontsize=24];
     edge [ fontname="Roman", fontsize=10 ];
@@ -132,17 +138,19 @@ the waveform itself.

    }

+.. _feature_extraction-highlights:

-**Highlights:**
+Highlights
+----------

-* Launch feature extractor jobs in online or offline mode:
+* Launch SNAX jobs in online or offline mode:

  * Online: Using /shm or framexmit protocol
  * Offline: Read frames off disk

 * Online/Offline DAGs available for launching jobs.

-  * Offline DAG parallelizes by time, channels are processed sequentially by subsets to reduce I/O concurrency issues. There are options to allow flexibility in choosing this, however.
+  * Offline DAG parallelizes by time, channels are processed sequentially by subsets to reduce I/O concurrency issues.

 * On-the-fly PSD generation (or take in a prespecified PSD)

@@ -165,3 +173,73 @@ the waveform itself.
  * Waveform type (currently Sine-Gaussian and half-Sine-Gaussian only)
  * Specify parameter ranges (frequency, Q for Sine-Gaussian based)
  * Min mismatch between templates
+
+.. _feature_extraction-online:
+
+Online Operation
+----------------
+
+An online DAG is provided in /gstlal-burst/share/snax/Makefile.gstlal_feature_extractor_online
+in order to provide a convenient way to launch online feature extraction jobs as well as auxiliary jobs as
+needed (synchronizer/hdf5 file sinks). A condensed list of instructions for use is also provided within the Makefile itself.
+
+There are four separate modes that can be used to launch online jobs:
+
+  1. Auxiliary channel ingestion:
+
+    a. Reading from framexmit protocol (DATA_SOURCE=framexmit).
+       This mode is recommended when reading in live data from LHO/LLO.
+
+    b. Reading from shared memory (DATA_SOURCE=lvshm).
+       This mode is recommended for reading in data for O2 replay (e.g. UWM).
+
+  2. Data transfer of features:
+
+    a. Saving features directly to disk, e.g. no data transfer.
+       This will save features to disk directly from the feature extractor,
+       and saves features periodically via hdf5.
+
+    b. Transfer of features via Kafka topics.
+       This requires a Kafka/Zookeeper service to be running (can be existing LDG
+       or your own). Features get transferred via Kafka from the feature extractor,
+       parallel instances of the extractor get synchronized, and then sent downstream
+       where it can be read by other processes (e.g. iDQ). In addition, an streaming
+       hdf5 file sink is launched where it'll dump features periodically to disk.
+
+In order to start up online runs, you'll need an installation of gstlal. An installation Makefile that
+includes Kafka dependencies are located at: gstlal/gstlal-burst/share/feature_extractor/Makefile.gstlal_idq_icc
+
+To run, making sure that the correct environment is sourced:
+
+  $ make -f Makefile.gstlal_feature_extractor_online
+
+Then launch the DAG with:
+
+  $ condor_submit_dag feature_extractor_pipe.dag
+
+.. _feature_extraction-offline:
+
+Offline Operation
+-----------------
+
+An offline DAG is provided in /gstlal-burst/share/snax/Makefile.gstlal_feature_extractor_offline
+in order to provide a convenient way to launch offline feature extraction jobs. A condensed list of
+instructions for use is also provided within the Makefile itself.
+
+For general use cases, the only configuration options that need to be changed are:
+
+ * User/Accounting tags: GROUP_USER, ACCOUNTING_TAG
+ * Analysis times: START, STOP
+ * Data ingestion: IFO, CHANNEL_LIST
+ * Waveform parameters: WAVEFORM, MISMATCH, QHIGH
+
+In order to start up offline runs, you'll need an installation of gstlal. An installation Makefile that
+includes Kafka dependencies are located at: gstlal/gstlal-burst/share/feature_extractor/Makefile.gstlal_idq_icc
+
+To generate a DAG, making sure that the correct environment is sourced:
+
+  $ make -f Makefile.gstlal_feature_extractor_offline
+
+Then launch the DAG with:
+
+  $ condor_submit_dag feature_extractor_pipe.dag
--- a/doc/source/getting-started.rst
+++ b/doc/source/getting-started.rst
-Getting started
-===============
-
-You can get a development copy of the gstlal software suite from git.  Doing this at minimum will require a development copy of lalsuite.
-  * https://git.ligo.org/lscsoft/gstlal
-  * https://git.ligo.org/lscsoft/lalsuite
-
-Source tarballs for GstLAL packages and all the LIGO/Virgo software dependencies are available here: http://software.ligo.org/lscsoft/source/
-
-Limited binary packages are available here: https://wiki.ligo.org/Computing/DASWG/SoftwareDownloads 
-
-Building and installing from source follows the normal GNU build procedures
-involving:
-
- 1. ./00init.sh 
- 2. ./configure 
- 3. make 
- 4. make install.
-
-You should build the packages in order of gstlal, gstlal-ugly,
-gstlal-calibration, gstlal-inspiral.  If you are building to a non FHS place
-(e.g., your home directory) you will need to ensure some environment variables
-are set so that your installation will function.  The following five variables
-must be set.  As **just an example**::
-
-	GI_TYPELIB_PATH="/path/to/your/installation/lib/girepository-1.0:${GI_TYPELIB_PATH}"
-	GST_PLUGIN_PATH="/path/to/your/installation/lib/gstreamer-0.10:${GST_PLUGIN_PATH}"
-	PATH="/path/to/your/installation/bin:${PATH}"
-	# Debian systems need lib, RH systems need lib64, including both doesn't hurt
-	PKG_CONFIG_PATH="/path/to/your/installation/lib/pkgconfig:/path/to/your/installation/lib64/pkgconfig:${PKG_CONFIG_PATH}"
-	# Debian systems need lib, RH systems need lib and lib64
-	PYTHONPATH="/path/to/your/installation/lib64/python2.7/site-packages:/path/to/your/installation/lib/python2.7/site-packages:$PYTHONPATH"
-
--- a/doc/source/gstlal-burst/code.rst
+++ b/doc/source/gstlal-burst/code.rst
-GstLAL burst code
-=================
-
-.. toctree::
-   :maxdepth: 2
-
-   bin/bin
-   python-modules/modules
--- a/doc/source/gstlal-burst/gstlal-burst.rst
+++ b/doc/source/gstlal-burst/gstlal-burst.rst
-####################################################################################################
-GstLAL burst
-####################################################################################################
-
-`GstLAL burst` contains several projects targeting a variety of different searches. These include:
-
-  * **Feature extraction:** Identify noise transient bursts (glitches) in auxiliary channel data.
-  * **Cosmic string search**
-  * **Excess power**
-
-Contents
-------------------------
-
-.. toctree::
-   :maxdepth: 2
-
-   overview
-   tutorials/tutorials
-   code
--- a/doc/source/gstlal-burst/tutorials/running_offline_jobs.rst
+++ b/doc/source/gstlal-burst/tutorials/running_offline_jobs.rst
-####################################################################################################
-Running Offline Jobs
-####################################################################################################
-
-An offline DAG is provided in /gstlal-burst/share/feature_extractor/Makefile.gstlal_feature_extractor_offline
-in order to provide a convenient way to launch offline feature extraction jobs. A condensed list of
-instructions for use is also provided within the Makefile itself.
-
-For general use cases, the only configuration options that need to be changed are:
-
- * User/Accounting tags: GROUP_USER, ACCOUNTING_TAG
- * Analysis times: START, STOP
- * Data ingestion: IFO, CHANNEL_LIST
- * Waveform parameters: WAVEFORM, MISMATCH, QHIGH
-
-Launching DAGs
-====================================================================================================
-
-In order to start up offline runs, you'll need an installation of gstlal. An installation Makefile that
-includes Kafka dependencies are located at: gstlal/gstlal-burst/share/feature_extractor/Makefile.gstlal_idq_icc
-
-To generate a DAG, making sure that the correct environment is sourced:
-
-  $ make -f Makefile.gstlal_feature_extractor_offline
-
-Then launch the DAG with:
-
-  $ condor_submit_dag feature_extractor_pipe.dag
-
-Configuration options
-====================================================================================================
-
-  Analysis times:
-    * START: set the analysis gps start time
-    * STOP: set the analysis gps stop time
-
-  Data ingestion:
-    * IFO: select the IFO for auxiliary channels to be ingested (H1/L1).
-    * CHANNEL_LIST: a list of channels for the feature extractor to process. Provided
-        lists for O1/O2 and H1/L1 lists are in gstlal/gstlal-burst/share/feature_extractor.
-    * MAX_SERIAL_STREAMS: Maximum # of streams that a single gstlal_feature_extractor job will
-        process at once. This is determined by sum_i(channel_i * # rates_i). Number of rates for a
-        given channels is determined by log2(max_rate/min_rate) + 1.
-    * MAX_PARALLEL_STREAMS: Maximum # of streams that a single job will run in the lifespan of a job.
-        This is distinct from serial streams since when a job is first launched, it will cache
-        auxiliary channel frames containing all channels that meet the criterion here, and then process
-        each channel subset sequentially determined by the serial streams. This is to save on input I/O.
-    * CONCURRENCY: determines the maximum # of concurrent reads from the same frame file. For most
-        purposes, it will be set to 1. Use this at your own risk.
-
-  Waveform parameters:
-    * WAVEFORM: type of waveform used to perform matched filtering (sine_gaussian/half_sine_gaussian).
-    * MISMATCH: maximum mismatch between templates (corresponding to Omicron's mismatch definition).
-    * QHIGH: maximum value of Q
-
-  Data transfer/saving:
-    * OUTPATH: directory in which to save features.
-    * SAVE_CADENCE: span of a typical dataset within an hdf5 file.
-    * PERSIST_CADENCE: span of a typical hdf5 file.
-
-Setting the number of streams (ADVANCED USAGE)
-====================================================================================================
-
-  NOTE: This won't have to be changed for almost all use cases, and the current configuration has been
-    optimized to aim for short run times.
-
-  Definition: Target number of streams (N_channels x N_rates_per_channel) that each cpu will process.
-
-    * if max_serial_streams > max_parallel_streams, all jobs will be parallelized by channel
-    * if max_parallel_streams > num_channels in channel list, all jobs will be processed serially,
-        with processing driven by max_serial_streams.
-    * any other combination will produce a mix of parallelization by channels and processing channels serially per job.
-
-  Playing around with combinations of MAX_SERIAL_STREAMS, MAX_PARALLEL_STREAMS, CONCURRENCY, will entirely
-  determine the structure of the offline DAG. Doing so will also change the memory usage for each job, and so you'll
-  need to tread lightly. Changing CONCURRENCY in particular may cause I/O locks due to jobs fighting to read from the same
-  frame file.
--- a/doc/source/gstlal-burst/tutorials/running_online_jobs.rst
+++ b/doc/source/gstlal-burst/tutorials/running_online_jobs.rst
-####################################################################################################
-Running Online Jobs
-####################################################################################################
-
-An online DAG is provided in /gstlal-burst/share/feature_extractor/Makefile.gstlal_feature_extractor_online
-in order to provide a convenient way to launch online feature extraction jobs as well as auxiliary jobs as
-needed (synchronizer/hdf5 file sinks). A condensed list of instructions for use is also provided within the Makefile itself.
-
-There are four separate modes that can be used to launch online jobs:
-
-  1. Auxiliary channel ingestion:
-
-    a. Reading from framexmit protocol (DATA_SOURCE=framexmit).
-       This mode is recommended when reading in live data from LHO/LLO.
-
-    b. Reading from shared memory (DATA_SOURCE=lvshm).
-       This mode is recommended for reading in data for O2 replay (e.g. UWM).
-
-  2. Data transfer of features:
-
-    a. Saving features directly to disk, e.g. no data transfer.
-       This will save features to disk directly from the feature extractor,
-       and saves features periodically via hdf5.
-
-    b. Transfer of features via Kafka topics.
-       This requires a Kafka/Zookeeper service to be running (can be existing LDG
-       or your own). Features get transferred via Kafka from the feature extractor,
-       parallel instances of the extractor get synchronized, and then sent downstream
-       where it can be read by other processes (e.g. iDQ). In addition, an streaming
-       hdf5 file sink is launched where it'll dump features periodically to disk.
-
-Launching DAGs
-====================================================================================================
-
-In order to start up online runs, you'll need an installation of gstlal. An installation Makefile that
-includes Kafka dependencies are located at: gstlal/gstlal-burst/share/feature_extractor/Makefile.gstlal_idq_icc
-
-To run, making sure that the correct environment is sourced:
-
-  $ make -f Makefile.gstlal_feature_extractor_online
-
-Then launch the DAG with:
-
-  $ condor_submit_dag feature_extractor_pipe.dag
-
-Configuration options
-====================================================================================================
-
-  General:
-    * TAG: sets the name used for logging purposes, Kafka topic naming, etc.
-
-  Data ingestion:
-    * IFO: select the IFO for auxiliary channels to be ingested.
-    * CHANNEL_LIST: a list of channels for the feature extractor to process. Provided
-        lists for O1/O2 and H1/L1 lists are in gstlal/gstlal-burst/share/feature_extractor.
-    * DATA_SOURCE: Protocol for reading in auxiliary channels (framexmit/lvshm).
-    * MAX_STREAMS: Maximum # of streams that a single gstlal_feature_extractor process will
-        process. This is determined by sum_i(channel_i * # rates_i). Number of rates for a
-        given channels is determined by log2(max_rate/min_rate) + 1.
-
-  Waveform parameters:
-    * WAVEFORM: type of waveform used to perform matched filtering (sine_gaussian/half_sine_gaussian).
-    * MISMATCH: maximum mismatch between templates (corresponding to Omicron's mismatch definition).
-    * QHIGH: maximum value of Q
-
-  Data transfer/saving:
-    * OUTPATH: directory in which to save features.
-    * SAVE_FORMAT: determines whether to transfer features downstream or save directly (kafka/hdf5).
-    * SAVE_CADENCE: span of a typical dataset within an hdf5 file.
-    * PERSIST_CADENCE: span of a typical hdf5 file.
-
-  Kafka options:
-    * KAFKA_TOPIC: basename of topic for features generated from feature_extractor
-    * KAFKA_SERVER: Kafka server address where Kafka is hosted. If features are run in same location,
-        as in condor's local universe, setting localhost:port is fine. Otherwise you'll need to determine
-        the IP address where your Kafka server is running (using 'ip addr show' or equivalent).
-    * KAFKA_GROUP: group for which Kafka producers for feature_extractor jobs report to.
-
-  Synchronizer/File sink options:
-    * PROCESSING_CADENCE: cadence at which incoming features are processed, so as to limit polling
-        of topics repeatedly, etc. Default value of 0.1s is fine.
-    * REQUEST_TIMEOUT: timeout for waiting for a single poll from a Kafka consumer.
-    * LATENCY_TIMEOUT: timeout for the feature synchronizer before older features are dropped. This
-        is to prevent a single feature extractor job from holding up the online pipeline. This will
-        also depend on the latency induced by the feature extractor, especially when using templates
-        that have latencies associated with them such as Sine-Gaussians.
--- a/doc/source/gstlal-burst/tutorials/tutorials.rst
+++ b/doc/source/gstlal-burst/tutorials/tutorials.rst
-####################################################################################################
-Tutorials
-####################################################################################################
-
-.. toctree::
-       :maxdepth: 2
-
-    running_online_jobs
-    running_offline_jobs
--- a/doc/source/gstlal-calibration/code.rst
+++ b/doc/source/gstlal-calibration/code.rst
-GstLAL calibration code
-=======================
-
-.. toctree::
-   :maxdepth: 2
-
-   bin/bin
-   python-modules/modules
--- a/doc/source/gstlal-calibration/gstlal-calibration.rst
+++ b/doc/source/gstlal-calibration/gstlal-calibration.rst
-GstLAL calibration
-==========================
-
-.. toctree::
-   :maxdepth: 2
-
-   code
No results found