Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • steffen.grunewald/gstlal
  • sumedha.biswas/gstlal
  • spiir-group/gstlal
  • madeline-wade/gstlal
  • hunter.schuler/gstlal
  • adam-mercer/gstlal
  • amit.reza/gstlal
  • alvin.li/gstlal
  • duncanmmacleod/gstlal
  • rebecca.ewing/gstlal
  • javed.sk/gstlal
  • leo.tsukada/gstlal
  • brian.bockelman/gstlal
  • ed-maros/gstlal
  • koh.ueno/gstlal
  • leo-singer/gstlal
  • lscsoft/gstlal
17 results
Show changes
Showing
with 1289 additions and 289 deletions
doc/source/_static/img/mr-resolve.png

109 KiB

doc/source/_static/img/mr-respond.png

132 KiB

{%- set logo = "gstlal.png" %}
{% extends "!layout.html" %}
GstLAL API
============
.. toctree::
:maxdepth: 2
:glob:
gstlal/python-modules/*modules
gstlal-inspiral/python-modules/*modules
gstlal-burst/python-modules/*modules
gstlal-ugly/python-modules/*modules
.. _cbc-analysis:
CBC Analysis (Offline)
========================
To start an offline CBC analysis, you'll need a configuration file
to point at the start/end times to analyze, input data products
(e.g. template bank, mass model) and other workflow-related configuration needed.
All the below steps assume a Singularity container with the GstLAL software
stack installed. Other methods of installation will follow a similar
procedure, however, with one caveat that workflows will not work on the
Open Science Grid (OSG).
For a dag on the OSG IGWN grid, you must use a Singularity container on
cvmfs, set the ``profile`` in ``config.yaml`` to ``osg`` and make sure
to submit the dag from a OSG node.
Otherwise the workflow is the same.
When running without a Singularity container, the commands below should be
modified. (Such as running ``gstlal_inspiral_workflow init -c config.yml``)
instead of ``singularity exec <image> gstlal_inspiral_workflow init -c config.yml``).
For ICDS gstlalcbc shared accounts, the ``env.sh`` contents much be changed
and instead of running ``$ X509_USER_PROXY=/path/to/x509_proxy ligo-proxy-init -p albert.einstein``
run ``source env.sh``. (Details are below.)
Running Workflows
^^^^^^^^^^^^^^^^^^
1 Build Singularity image (optional)
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""""""
NOTE: If you are using a reference Singularity container (suitable in most
cases), you can skip this step. The ``<image>`` throughout this doc refers to
``singularity-image`` specified in the ``condor`` section of your configuration.
If not using the reference Singularity container, say for local development, you
can specify a path to a local container and use that for the workflow (non-OSG).
To pull a container with gstlal installed, run:
.. code:: bash
$ singularity build --sandbox --fix-perms <image-name> docker://containers.ligo.org/lscsoft/gstlal:master
To use a branch other than master, you can replace `master` in the above command with the name of the desired branch. To use a custom build instead, gstlal will need to be installed into the container from your modified source code. For installation instructions, see the
`installation page <https://docs.ligo.org/lscsoft/gstlal/installation.html>`_
2. Set up workflow
""""""""""""""""""""
First, we create a new analysis directory and switch to it:
.. code:: bash
$ mkdir <analysis-dir>
$ cd <analysis-dir>
$ mkdir bank mass_model idq dtdphi
Default configuration files and environment (``env.sh``) for a
variety of different banks are contained in the
`offline-configuration <https://git.ligo.org/gstlal/offline-configuration>`_
repository.
One can run the commands below to grab the configuration files, or clone the
repository and copy the files as needed into the analysis directory.
To download data files (mass model, template banks) that may be needed for
offline runs, see the
`README <https://git.ligo.org/gstlal/offline-configuration/-/blob/main/README.md>`_
in the offline-configuration repo. Move the template bank(s) into ``bank`` and the mass model into ``mass_model``.
For example, to grab all the relevant files for a small BNS dag:
.. code:: bash
$ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/configs/bns-small/config.yml
$ curl -O https://git.ligo.org/gstlal/offline-configuration/-/raw/main/env.sh
$ source /cvmfs/oasis.opensciencegrid.org/ligo/sw/conda/etc/profile.d/conda.sh
$ conda activate igwn
$ dcc archive --archive-dir=. --files -i T2200318-v2
$ conda deactivate
Then move the template bank, mass model, idq file, and dtdphi file into their corresponding directories.
When running an analysis on the ICDS cluster in the gstlalcbc shared account,
the contents of ``env.sh`` must be changed to what is given below.
In addition, below in the tutorial, where it says to run ``ligo-proxy-init -p``,
instead, run ``source env.sh`` on the modified ``env.sh``.
When running on non gstlalcbc shared accounts on ICDS or when running on other
clusters, the ``env.sh`` does not need to be modifed, and ``ligo-proxy-init -p``
can be run as in the tutorial.
.. code-block:: yaml
export PYTHONUNBUFFERED=1
unset X509_USER_PROXY
export X509_USER_CERT=/ligo/home/ligo.org/gstlalcbc/.cert/gstlalcbc_icds_robot.key.pem
export X509_USER_KEY=/ligo/home/ligo.org/gstlalcbc/.cert/gstlalcbc_icds_robot.key.pem
export GSTLAL_FIR_WHITEN=0
Now, we'll need to modify the configuration as needed to run the analysis. At
the very least, setting the start/end times and the instruments to run over:
.. code-block:: yaml
start: 1187000000
stop: 1187100000
instruments: H1L1
Ensure the template bank, mass model, idq file, and dtdphi file are pointed to in the configuration:
.. code-block:: yaml
data:
template-bank: bank/gstlal_bank_small.xml.gz
.. code-block:: yaml
prior:
mass-model: bank/mass_model_small.h5
idq-timeseries: idq/H1L1-IDQ_TIMESERIES-1239641219-692847.h5
dtdphi: dtdphi/inspiral_dtdphi_pdf.h5
If you're creating a summary page for results, you'll need to point at a
location where they are web-viewable:
.. code-block:: yaml
summary:
webdir: ~/public_html/
If you're running on LIGO compute resources and your username doesn't match your
albert.einstein username, you'll also additionally need to specify the
accounting group user for condor to track accounting information:
.. code-block:: yaml
condor:
accounting-group-user: albert.einstein
In addition, update the ``singularity-image`` in the ``condor`` section of your configuration if needed:
.. code-block:: yaml
condor:
singularity-image: /cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master
If not using a reference Singularity image, you can replace this with the
full path to a local singularity container ``<image>``.
For more detailed configuration options, take a look at the :ref:`configuration
section <analysis-configuration>` below.
If you haven't installed site-specific profiles yet (per-user), you can run:
.. code:: bash
$ singularity exec <image> gstlal_grid_profile install
which will install configurations that are site-specific, i.e. ``ldas`` and ``icds``.
You can select which profile to use in the ``condor`` section:
.. code-block:: yaml
condor:
profile: ldas
For a OSG IGWN grid run, use ``osg``.
To view which profiles are available, you can run:
.. code:: bash
$ singularity exec <image> gstlal_grid_profile list
Note, you can install :ref:`custom profiles <install-custom-profiles>` as well.
Once you have the configuration, data products, and grid profiles installed, you
can set up the Makefile using the configuration, which we'll then use for
everything else, including the data file needed for the workflow, the workflow
itself, the summary page, etc.
.. code:: bash
$ singularity exec <image> gstlal_inspiral_workflow init -c config.yml
By default, this will generate the full workflow. If you want to only run the
filtering step, a rerank, or an injection-only workflow, you can instead specify
the workflow as well, e.g.
.. code:: bash
$ singularity exec <image> gstlal_inspiral_workflow init -c config.yml -w injection
for an injection-only workflow.
If you already have a Makefile and need to update it based on an updated
configuration, run ``gstlal_inspiral_workflow`` with ``--force``.
Next, if you accessing non-public (GWOSC) data, you'll need to set up your proxy
to ensure you can get access to LIGO data:
.. code:: bash
$ X509_USER_PROXY=/path/to/x509_proxy ligo-proxy-init -p albert.einstein
Note that we are running this step outside of Singularity. This is because ``ligo-proxy-init``
is not installed within the image currently.
If you are running on the ICDS gstlalcbc shared account, do not run the command
above.
Instead, run:
.. code:: bash
$ source env.sh
Also update the configuration accordingly (if needed):
.. code-block:: yaml
source:
x509-proxy: /path/to/x509_proxy
Finally, set up the rest of the workflow including the DAG for submission:
.. code:: bash
$ singularity exec -B $TMPDIR <image> make dag
If running on the OSG IGWN grid, make sure to submit the dags from the OSG node.
This should create condor DAGs for the workflow. Mounting a temporary directory
is important as some of the steps will leverage a temporary space to generate files.
If one desires to see detailed error messages, add ``<PYTHONUNBUFFERED=1>`` to
``environment`` in the submit (``*.sub``) files by running:
.. code:: bash
$ sed -i '/^environment = / s/\"$/ PYTHONUNBUFFERED=1\"/' *.sub
3. Launch workflows
"""""""""""""""""""""""""
.. code:: bash
$ source env.sh
$ make launch
This is simply a thin wrapper around `condor_submit_dag` launching the DAG in question.
You can monitor the dag with Condor CLI tools such as ``condor_q`` and ``tail -f full_inspiral_dag.dag.dagman.out``.
4. Generate Summary Page
"""""""""""""""""""""""""
After the DAG has completed, you can generate the summary page for the analysis:
.. code:: bash
$ singularity exec <image> make summary
To make an open-box page after this, run:
.. code:: bash
$ make unlock
.. _analysis-configuration:
Configuration
^^^^^^^^^^^^^^
The top-level configuration consists of the analysis times and detector configuration:
.. code-block:: yaml
start: 1187000000
stop: 1187100000
instruments: H1L1
min-instruments: 1
These set the start and stop gps times of the analysis, plus the detectors to use
(H1=Hanford, L1=Livingston, V1=Virgo). There is a nice online converter for gps times
here: https://www.gw-openscience.org/gps/. You can also use the program `gpstime` as
well. Note that these start and stop times have no knowledge about science
quality data, the actual science quality data that are analyzed is typically a
subset of the total time. Information about which detectors were on at different
times is available here: https://www.gw-openscience.org/data/.
``min-instruments`` sets the minimum number of instruments we will allow to form
an event, e.g. setting it to 1 means the analysis will consider single detector
events, 2 means we will only consider events that are coincident across at least
2 detectors.
Section: Data
""""""""""""""
.. code-block:: yaml
data:
template-bank: bank/gstlal_bank_small.xml.gz
analysis-dir: /path/to/analysis/dir
The ``template-bank`` option points to the template bank file. These
are xml files that follow the LIGOLW (LIGO light weight) schema. The template
bank in particular contains a table that lists the parameters of all of the
templates, it does not contain the actual waveforms themselves. Metadata such as
the waveform approximant and the frequency cutoffs are also listed in this file.
The ``analysis-dir`` option is used if the user wishes to point to an existing
analysis to perform a rerank or an injection-only workflow. This grabs existing files
from this directory to seed the rerank/injection workflows.
One can use multiple sub template banks. In this case, the configuration might look like:
.. code-block:: yaml
data:
template-bank:
bns: bank/sub_bank/bns.xml.gz
nsbh: bank/sub_bank/nsbh.xml.gz
bbh_1: bank/sub_bank/bbh_low_q.xml.gz
bbh_2: bank/sub_bank/other_bbh.xml.gz
imbh: bank/sub_bank/imbh_low_q.xml.gz
Section: Source
""""""""""""""""
.. code-block:: yaml
source:
data-source: frames
data-find-server: datafind.gw-openscience.org
frame-type:
H1: H1_GWOSC_O2_16KHZ_R1
L1: L1_GWOSC_O2_16KHZ_R1
channel-name:
H1: GWOSC-16KHZ_R1_STRAIN
L1: GWOSC-16KHZ_R1_STRAIN
sample-rate: 4096
frame-segments-file: segments.xml.gz
frame-segments-name: datasegments
x509-proxy: x509_proxy
The ``data-find-server`` option points to a server that is queried to find the
location of frame files. The address shown above is a publicly available server
that will return the locations of public frame files on cvmfs. Each frame file
has a type that describes the contents of the frame file, and may contain
multiple channels of data, hence the channel names must also be specified.
``frame-segments-file`` points to a LIGOLW xml file that describes the actual
times to analyze, i.e. it lists the time that science quality data are
available. These files are generalized enough that they could describe different
types of data, so ``frame-segments-name`` is used to specify which segment to
consider. In practice, the segments file we produce will only contain the
segments we want. Users will typically not change any of these options once they
are set for a given instrument and observing run. ``x509-proxy`` is the path to
your ``x509-proxy``.
Section: Segments
""""""""""""""""""
The ``segments`` section specifies how to generate segments and vetoes for the
workflow. There are two backends to determine where to query segments and vetoes
from, ``gwosc`` (public) and ``dqsegdb`` (authenticated).
An example of configuration with the ``gwosc`` backend looks like:
.. code-block:: yaml
segments:
backend: gwosc
vetoes:
category: CAT1
Here, the ``backend`` is set to ``gwosc`` so both segments and vetoes are determined
by querying the GWOSC server. There is no additional configuration needed to query
segments, but for vetoes, we also need to specify the ``category`` used for vetoes.
This can be one of ``CAT1``, ``CAT2``, or ``CAT3``. By default, segments are generated
by applying ``CAT1`` vetoes as recommended by the Detector Characterization group.
An example of configuration with the ``dqsegdb`` backend looks like:
.. code-block:: yaml
segments:
backend: dqsegdb
science:
H1: DCS-ANALYSIS_READY_C01:1
L1: DCS-ANALYSIS_READY_C01:1
V1: ITF_SCIENCE:2
vetoes:
category: CAT1
veto-definer:
file: H1L1V1-HOFT_C01_V1ONLINE_O3_CBC.xml
version: O3b_CBC_H1L1V1_C01_v1.2
epoch: O3
Here, the ``backend`` is set to ``dqsegdb`` so both segments and vetoes are determined
by querying the DQSEGDB server. To query segments, one needs to specify the flag used
per instrument to query segments from. For vetoes, we need to specify the ``category``
used for vetoes as with the ``dqsegdb`` backend. Additionally, a veto definer file is
used to determine which flags are used for which veto categories. The file need not be
provided, the ``file``, ``version`` and ``epoch`` fully specify how to access the veto
definer file used for generating vetoes.
Section: PSD
""""""""""""""
.. code-block:: yaml
psd:
fft-length: 8
sample-rate: 4096
The PSD estimation method used by GstLAL is a modified median-Welch method that
is described in detail in Section IIB of Ref [1]. The FFT length sets the length
of each section that is Fourier transformed. The default whitener will use
zero-padding of one-fourth the FFT length on either side and will overlap
fourier transformed segments by one-fourth the FFT length. For example, an
``fft-length`` of 8 means that each Fourier transformed segment used in the PSD
estimation (and consequently the whitener) will contain 4 seconds of data with 2
seconds of zero padding on either side, and will overlap the next segment by 2
seconds (i.e. the last two seconds of data in one segment will be the first two
seconds of data in the following window).
Section: SVD
""""""""""""""
.. code-block:: yaml
svd:
f-low: 20.0
num-chi-bins: 1
sort-by: mchirp
approximant:
- 0:1.73:TaylorF2
- 1.73:1000:SEOBNRv4_ROM
tolerance: 0.9999
max-f-final: 1024.0
num-split-templates: 200
overlap: 30
num-banks: 5
samples-min: 2048
samples-max-64: 2048
samples-max-256: 2048
samples-max: 4096
autocorrelation-length: 701
max-duration: 128
manifest: svd_manifest.json
``f-low`` sets the lower frequency cutoff for the analysis in Hz.
``num-chi-bins`` is a tunable parameter related to the template bank binning
procedure; specifically, sets the number of effective spin parameter bins to use
in the chirp-mass / effective spin binning procedure described in Sec. IID and
Fig. 6 of [1].
``sort-by`` selects the template sort column. This controls how to bin the
bank in sub-banks suitable for the svd decomposition. It can be ``mchirp``
(sorts by chirp mass), ``mu`` (sorts by mu1 and mu2 coordiantes), or
``template_duration`` (sorts by template duration).
``approximant`` specifies the waveform approximant that should be used along
with chirp mass bounds to use that approximant in. 0:1000:TaylorF2 means use the
TaylorF2 approximant for waveforms from systems with chirp-masses between 0 and
1000 solar masses. Multiple waveforms and chirp-mass bounds can be provided.
``tolerance`` is a tunable parameter related to the truncation of SVD basis
vectors. A tolerance of 0.9999 means the targeted matched-filter inner-product
of the original waveform and the waveform reconstructed from the SVD is 0.9999.
``max-f-final`` sets the max frequency of the template.
``num-split-templates``, ``overlap``, ``num-banks``, are tunable parameters
related to the SVD process. ``num-split-templates`` sets the number of templates
to decompose at a time; ``overlap`` sets the number of templates from adjacent
template bank regions to pad to the region being considered in order to actually
compute the SVD (this helps the performance of the SVD, and these pad templates
are not reconstructed); ``num-banks`` sets the number of sets of decomposed
templates to include in a given bin for the analysis. For example,
``num-split-templates`` of 200, ``overlap`` of 30, and ``num-banks`` of 5 means
that each SVD bank file will contain 5 decomposed sets of 200 templates, where
the SVD was computed using an additional 15 templates on either side of the 200
(as defined by the binning procedure).
``samples-min``, ``samples-max-64``, ``samples-max-256``, and ``samples-max``
are tunable parameters related to the template time slicing procedure used by
GstLAL (described in Sec. IID and Fig. 7 of Ref. [1], and references therein).
Templates are slice in time before the SVD is applied, and only sampled at the
rate necessary for the highest frequency in each time slice (rounded up to a
power of 2). For example, the low frequency part of a waveform may only be
sampled at 32 Hz, while the high frequency part may be sampled at 2048 Hz
(depending on user settings). ``samples-min`` sets the minimum number of samples
to use in any time slice. ``samples-max`` sets the maximum number of samples to
use in any time slice with a sample rate below 64 Hz; ``samples-max-64`` sets
the maximum number of samples to use in any time slice with sample rates between
64 Hz and 256 Hz; ``samples-max-256`` sets the maximum number of samples to use
in any time slice with a sample rate greater than 256 Hz.
``autocorrelation-length`` sets the number of samples to use when computing the
autocorrelation-based test-statistic, described in IIIC of Ref [1].
``max-duration`` sets the maximum template duration in seconds. One can choose
not to use ``max-duration``.
``manifest`` sets the name of a file that will contain metadata about the
template bank bins.
If one uses multiple sub template banks, SVD configurations can be specified
for each sub template bank. Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
Users will typically not change these options.
Section: Filter
""""""""""""""""
.. code-block:: yaml
filter:
fir-stride: 1
min-instruments: 1
coincidence-threshold: 0.01
ht-gate-threshold: 0.8:15.0-45.0:100.0
veto-segments-file: vetoes.xml.gz
time-slide-file: tisi.xml
injection-time-slide-file: inj_tisi.xml
time-slides:
H1: 0:0:0
L1: 0.62831:0.62831:0.62831
injections:
bns:
file: bns_injections.xml
range: 0.01:1000.0
``fir-stride`` is a tunable parameter related to the matched-filter procedure,
setting the length in seconds of the output of the matched-filter element.
``coincidence-threshold`` is the time in seconds to add to the light-travel time
when searching for coincidences between detectors.
``ht-gate-threshold`` sets the h(t) gate threshold as a function of chirp-mass.
The h(t) gate threshold is a value over which the output of the whitener plus
some padding will be set to zero (as described in IIC of Ref. [1]).
0.8:15.0-45.0:100.0 mean that a template bank bin that that has a max chirp-mass
template of 0.8 solar masses will use a gate threshold of 15, a bank bin with a
max chirp-mass of 100 will use a threshold of 45, and all other thresholds are
described by a linear function between those two points.
``veto-segments-file`` sets the name of a LIGOLW xml file that contains any
vetoes used for the analysis, even if there are no vetoes.
``time-slide-file`` and ``inj-time-slide-file`` are LIGOLW xml files that
describe any time slides used in the analysis. A typical analysis will only
analyze injections with the zerolag “time slide” (i.e. the data are not slid in
time), and will consider the zerolag and one other time slide for the
non-injection analysis. The time slide is used to perform a blind sanity check
of the noise model.
injections will list a set of injections, each with their own label. In this
example, there is only one injection set, and it is labeled “bns”. file is a
relative path to the injection file (a LIGOLW xml file that contains the
parameters of the injections, but not the actual waveforms themselves). range
sets the chirp-mass range that should be considered when searching for this
particular set of injections. Multiple injection files can be provided, each
with their own label, file, and range.
The only option here that a user will normally interact with is the injections
option.
When using multiple sub template banks, replace ``bns:`` under ``injections:``
with ``inj:``
Section: Injections
""""""""""""""""""""
.. code-block:: yaml
injections:
sets:
expected-snr:
f-low: 15.0
bns:
f-low: 14.0
seed: 72338
time:
step: 32
interval: 1
shift: 0
waveform: SpinTaylorT4threePointFivePN
mass-distr: componentMass
mass1:
min: 1.1
max: 2.8
mass2:
min: 1.1
max: 2.8
spin1:
min: 0
max: 0.05
spin2:
min: 0
max: 0.05
distance:
min: 10000
max: 80000
spin-aligned: True
file: bns_injections.xml
The ``sets`` subsection is used to create injection sets to be used within the
analysis, and referenced to by name in the ``filter`` section. In ``sets``, the
injections are grouped by key. In this case, one ``bns`` injection set which
creates the ``bns_injections.xml`` file and used in the ``injections`` section
of the ``filter`` section.
For multiple injections, the chunk for ``bns:`` should be repeated for each
injection. Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
Besides creating injection sets, the ``expected-snr`` subsection is used for the
expected SNR jobs. These settings are used to override defaults as needed.
``spin-aligned`` specifies whether the injections should be spin-(mis)aligned
spins (if ``spin-aligned: True``) or precessing spins (if ``spin-aligned: False``).
In the case of multiple injection sets that need to be combined, one can add
a few options to create a combined file and reference that within the filter
jobs. This can be useful for large banks with a large set of templates. To
do this, one can add the following:
.. code-block:: yaml
injections:
combine: true
combined-file: combined_injections.xml
The injections created are generated from the ``lalapps_inspinj`` program, with
the following mapping between configuration and command line options:
* ``f-low``: ``--f-lower``
* ``seed``: ``--seed``
* ``time`` section: ``-time-step``, ``--time-interval``. ``shift`` adjusts the
start time appropriately.
* ``waveform``: ``--waveform``
* ``mass-distr``: ``--m-distr``
* ``mass/spin/distance`` sections: maps to options like ``--min-mass1``
Section: Prior
""""""""""""""""
.. code-block:: yaml
prior:
mass-model: mass_model/mass_model_small.h5
``mass-model`` is a relative path to the file that contains the mass model. This
model is used to weight templates appropriately when assigning ranking
statistics based on our understanding of the astrophysical distribution of
signals. Users will not typically change this option.
An optional ``dtdphi-file`` and ``idq-timeseries`` can be provided here. If not
given, a default model (included in the standard installation) will be used.
The dtdph file will specify a probability distribution function for the
probability of measuring a given time shift and phase shift in mulitple detector
observation. It enters in the ranking statistics.
The idq file will give information about the data quality around the time of
coalescence.
If specifying idq files and dtdphi files, create a directory for idq and dtdphi
each in the ``<analysis-dir>``, and put the idq files and dtdphi files in the
respective directory.
Reference `mario config <https://git.ligo.org/gstlal/offline-configuration/configs/mario/config.yml>`_ .
Section: Rank
""""""""""""""""
.. code-block:: yaml
rank:
ranking-stat-samples: 4194304
``ranking-stat-samples`` sets the number of samples to draw from the noise model
when computing the distribution of log likelihood-ratios (the ranking statistic)
under the noise hypothesis. Users will not typically change this option.
Section: Summary
""""""""""""""""""
.. code-block:: yaml
summary:
webdir: /path/to/public_html/folder
``webdir`` sets the path of the output results webpages produced by the
analysis. Users will typically change this option for each analysis.
Section: Condor
""""""""""""""""""
.. code-block:: yaml
condor:
profile: osg-public
accounting-group: ligo.dev.o3.cbc.uber.gstlaloffline
accounting-group-user: <albert.einstein>
singularity-image: <image>
``profile`` sets a base level of configuration options for condor.
``accounting-group`` sets accounting group details on LDG resources. Currently
the machinery to produce an analysis dag requires this option, but the option is
not actually used by analyses running on non-LDG resources.
``singularity-image`` sets the path of the container on cvmfs that the analysis
should use. Users will not typically change this option
(use ``/cvmfs/singularity.opensciencegrid.org/lscsoft/gstlal:master``).
.. _install-custom-profiles:
Installing Custom Site Profiles
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
You can define a site profile as YAML. As an example, we can create a file called ``custom.yml``:
.. code-block:: yaml
scheduler: condor
requirements:
- "(IS_GLIDEIN=?=True)"
Both the directives and requirements sections are optional.
To install one so it's available for use, run:
.. code:: bash
$ singularity exec <image> gstlal_grid_profile install custom.yml
......@@ -22,9 +22,11 @@ sys.path.insert(0, os.path.abspath('.'))
sys.path.insert(0, os.path.abspath('../../gstlal/python'))
sys.path.insert(0, os.path.abspath('../../gstlal-inspiral/python'))
sys.path.insert(0, os.path.abspath('../../gstlal-burst/python'))
sys.path.insert(0, os.path.abspath('../../gstlal-calibration/python'))
sys.path.insert(0, os.path.abspath('../../gstlal-ugly/python'))
# on_rtd is whether we are on readthedocs.org, this line of code grabbed
# from docs.readthedocs.org
on_rtd = os.environ.get('READTHEDOCS', None) == 'True'
# -- General configuration ------------------------------------------------
......@@ -35,16 +37,25 @@ sys.path.insert(0, os.path.abspath('../../gstlal-ugly/python'))
# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ['sphinx.ext.autodoc',
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.autosummary',
'sphinx.ext.intersphinx',
'sphinx.ext.todo',
'sphinx.ext.coverage',
'sphinx.ext.imgmath',
# 'sphinx.ext.imgmath',
'sphinx.ext.ifconfig',
'sphinx.ext.viewcode',
'sphinx.ext.githubpages',
'sphinx.ext.graphviz']
'sphinx.ext.graphviz',
'sphinx.ext.mathjax',
'myst_parser',
]
myst_enable_extensions = [
"amsmath",
"dollarmath",
]
# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']
......@@ -52,8 +63,8 @@ templates_path = ['_templates']
# The suffix(es) of source filenames.
# You can specify multiple suffix as a list of string:
#
# source_suffix = ['.rst', '.md']
source_suffix = '.rst'
source_suffix = ['.rst', '.md']
# source_suffix = '.rst'
# The master toctree document.
master_doc = 'index'
......@@ -61,7 +72,7 @@ master_doc = 'index'
# General information about the project.
# FIXME get from autotools
project = u'GstLAL'
copyright = u'2018, GstLAL developers'
copyright = u'2021, GstLAL developers'
author = u'GstLAL developers'
# The version info for the project you're documenting, acts as replacement for
......@@ -69,10 +80,9 @@ author = u'GstLAL developers'
# built documents.
#
# The short X.Y version.
# FIXME get from autotools
version = u'1.x'
#version = u'1.x'
# The full version, including alpha/beta/rc tags.
release = u'1.x'
release = ''
# The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages.
......@@ -98,28 +108,32 @@ todo_include_todos = True
# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'#'classic'
html_logo = "gstlal_small.png"
html_theme = 'default'
# Theme options are theme-specific and customize the look and feel of a theme
# further. For a list of options available for each theme, see the
# documentation.
html_theme_options = {
'fixed_sidebar': 'true',
'sidebar_width': '200px',
'page_width': '95%',
'show_powered_by': 'false',
'logo_name': 'true',
}
#html_theme_options = {}
def setup(app):
app.add_stylesheet('css/my_theme.css')
# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
if not on_rtd: # only import and set the theme if we're building docs locally
import sphinx_rtd_theme
html_theme = 'sphinx_rtd_theme'
html_theme_path = [sphinx_rtd_theme.get_html_theme_path()]
# Custom sidebar templates, maps document names to template names.
html_sidebars = { '**': ['navigation.html', 'relations.html', 'searchbox.html'] }
html_last_updated_fmt = None
#html_sidebars = { '**': ['navigation.html', 'relations.html', 'searchbox.html'] }
#html_last_updated_fmt = None
# Add a favicon to doc pages
html_favicon = '_static/favicon.ico'
# -- Options for HTMLHelp output ------------------------------------------
......
# Container Development Environment
The container development workflow consists of a few key points:
- Build tools provided by and used within a writable gstlal container.
- Editor/git used in or outside of the container as desired.
- Applications are run in the development container.
The benefits of developing in a writable container:
- Your builds do not depend on the software installed on the system, you don't have to worry about behavior changes due to system package updates.
- Your build environment is the same as that of everyone else using the same base container. This makes for easier collaboration.
- Others can run your containers and get the same results. You don't have to worry about environment mis-matches.
## Create a writable container
The base of a development environment is a gstlal container. It is typical to start with the
current master build. However, you can use the build tools to overwite the install in the container so the
choice of branch in your gstlal repository matters more than the container that you start with. The job of
the container is to provide a well-defined set of dependencies.
```bash
singularity build --sandbox --fix-perms CONTAINER_NAME docker://containers.ligo.org/lscsoft/gstlal:master
```
This will creat a directory named CONTAINER_NAME. That directory is a *singularity container*.
## Check out gstlal
In a directory of your choice, under your home directory, run:
```
git clone https://git.ligo.org/lscsoft/gstlal DIRNAME
```
This will create a git directory named DIRNAME which is referred to in the following as your "gstlal dir". The gstlal dir
contains several directories that contain components that can be built independently (e.g., `gstlal`, `gstlal-inspiral`, `gstlal-ugly`, ...).
A common practice is to run the clone command in the CONTAINER_NAME directory and use `src` as `DIRNAME`. In this case, when you run your
container, your source will be available in the directory `/src`.
## Develop
Edit and make changes under your gstlal dir using editors and git outside of the container (or inside if you prefer).
## Build a component
To build a component:
1. cd to your gstlal directory
2. Run your container:
```
singularity run --writable -B $TMPDIR CONTAINER_NAME /bin/bash
```
3. cd to the component directory under your gstlal dir.
4. Initialize the build system for your component. You only need to do this once per container per component directory:
```
./00init.sh
./configure --prefix=/usr --libdir=/usr/lib64
```
The arguments to configure are required so that you overwrite the build of gstlal in your container.
Some components have dependencies on others. You should build GstLAL components in the following order:
1. `gstlal`
2. `gstlal-ugly`
3. `gstlal-inspiral`, `gstlal-burst`, `gstlal-calibrarion` (in any order)
For example, if you want to build `gstlal-ugly`, you should build `gstlal` first.
5. Run make and make install
```
make
make install
```
Note that the container is writable, so your installs will persist after you exit the container and run it again.
## Run your code
You can run your code in the following ways:
1. Run your container using singularity and issue commands interactively "inside the container":
```
singularity run --writable -B $TMPDIR PATH_TO_CONTAINER /bin/bash
/bin/gstlal_reference_psd --channel-name=H1=foo --data-source=white --write-psd=out.psd.xml --gps-start-time=1185493488 --gps-end-time=1185493788
```
2. Use `singularity exec` and give your command on the singularity command line:
```
singularity exec --writable -B $TMPDIR PATH_TO_CONTAINER /bin/gstlal_reference_psd --channel-name=H1=foo --data-source=white --write-psd=out.psd.xml --gps-start-time=1185493488 --gps-end-time=1185493788
```
3. Use your container in a new or existing [container-based gstlal workflow](/gstlal/cbc_analysis.html) on a cluster with a shared filesystem where your container resides. For example, you can run on the CIT cluster or on the PSU cluster, but not via the OSG (you can run your container as long as your container is available on the shared filesystem of the cluster where you want to run). In order to run your code on the OSG, you would have to arrange to have your container published to cvmfs.
# Contributing Workflow
## Git Branching
The `gstlal` team uses the standard git-branch-and-merge workflow, which has brief description
at [GitLab](https://docs.gitlab.com/ee/gitlab-basics/feature_branch_workflow.html) and a full description
at [BitBucket](https://www.atlassian.com/git/tutorials/comparing-workflows/feature-branch-workflow). As depicted below,
the workflow involves the creation of new branches for changes, the review of those branches through the Merge Request
process, and then the merging of the new changes into the main branch.
![git-flow](_static/img/git-flow.png)
### Git Workflow
In general the steps for working with feature branches are:
1. Create a new branch from master: `git checkout -b feature-short-desc`
1. Edit code (and tests)
1. Commit changes: `git commit . -m "comment"`
1. Push branch: `git push origin feature-short-desc`
1. Create merge request on GitLab
## Merge Requests
### Creating a Merge Request
Once you push feature branch, GitLab will prompt on gstlal repo [home page](). Click “Create Merge Request”, or you can
also go to the branches page (Repository > Branches) and select “Merge Request” next to your branch.
![mr-create](_static/img/mr-create.png)
When creating a merge request:
1. Add short, descriptive title
1. Add description
- (Uses markdown .md-file style)
- Summary of additions / changes
- Describe any tests run (other than CI)
1. Click “Create Merge Request”
![mr-create](_static/img/mr-create-steps.png)
### Collaborating on merge requests
The Overview page give a general summary of the merge request, including:
1. Link to other page to view changes in detail (read below)
1. Code Review Request
1. Test Suite Status
1. Discussion History
1. Commenting
![mr-overview](_static/img/mr-overview.png)
#### Leaving a Review
The View Changes page gives a detailed look at the changes made on the feature branch, including:
1. List of files changed
1. Changes
- Red = removed
- Green = added
1. Click to leave comment on line
1. Choose “Start a review”
![mr-changes](_static/img/mr-changes.png)
After review started:
1. comment pending
1. Submit review
![mr-changes](_static/img/mr-change-submit.png)
#### Responding to Reviews
Reply to code review comments as needed Use “Start a review” to submit all replies at once
![mr-changes](_static/img/mr-respond.png)
Resolve threads when discussion on a particular piece of code is complete
![mr-changes](_static/img/mr-resolve.png)
### Merging the Merge Request
Merging:
1. Check all tests passed
1. Check all review comments resolved
1. Check at least one review approval
1. Before clicking “Merge”
- Check “Delete source branch”
- Check “Squash commits” if branch history not tidy
1. Click “Merge”
1. Celebrate
![mr-merge](_static/img/mr-merge.png)
# Contributing Documentation
This guide assumes the reader has read the [Contribution workflow](contributing.md) for details about making changes to
code within gstlal repo, since the documentation files are updated by a similar workflow.
## Writing Documentation
In general, the gstlal documentation uses [RestructuredText (rst)](https://docutils.sourceforge.io/rst.html) files
ending in `.rst` or [Markdown](https://www.markdownguide.org/basic-syntax/) files ending in `.md`.
The documentation files for gstlal are located under `gstlal/doc/source`. If you add a new page (doc file), make sure to
reference it from the main index page.
Useful Links:
- [MyST Directive Syntax](https://myst-parser.readthedocs.io/en/latest/syntax/syntax.html#syntax-directives)
Executables
===============
.. toctree::
:maxdepth: 2
gstlal/bin/bin
gstlal-inspiral/bin/bin
gstlal-burst/bin/bin
gstlal-ugly/bin/bin
.. _extrinsic-parameters-generation:
Generating Extrinsic Parameter Distributions
============================================
This tutorial will show you how to regenerate the extrinsic parameter
distributions used to determine the likelihood ratio term that accounts for the
relative times-of-arrival, phases, and amplitudes of a CBC signal at each of
the LVK detectors.
There are two parts described below that represents different terms. Full
documentation can be found here:
https://lscsoft.docs.ligo.org/gstlal/gstlal-inspiral/python-modules/stats.inspiral_extrinsics.html
Setting up the dt, dphi, dsnr dag
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1. Setup a work area and obtain the necessary input files
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
You will need to create a directory on a cluster running HTCondor, e.g.,
.. code:: bash
$ mkdir dt_dphi_dsnr
$ cd dt_dphi_dsnr
This workflow requires estimates of the power spectral densities for LIGO,
Virgo and KAGRA. For this tutorial we use projected O4 sensitivities in the
LIGO DCC. You can feel free to substitute these to suit your needs.
We will use the following files found at: https://dcc.ligo.org/LIGO-T2000012 ::
aligo_O4high.txt
avirgo_O4high_NEW.txt
kagra_3Mpc.txt
Download the above files and place them in the dt_dphi_dsnr directory that you are currently in.
2. Excecute commands to generate the HTCondorDAG
"""""""""""""""""""""""""""""""""""""""""""""""""
For this tutorial, we assume that you have a singularity container with the
gstlal software. More details can be found here:
https://lscsoft.docs.ligo.org/gstlal/installation.html
The following Makefile illustrates the sequence of commands required to generate an HTCondor workflow. You can copy this into a file called ``Makefile`` and modify it as you wish.
.. code:: make
SINGULARITY_IMAGE=/ligo/home/ligo.org/chad.hanna/development/gstlal-dev/
sexec=singularity exec $(SINGULARITY_IMAGE)
all: dt_dphi.dag
# 417.6 Mpc Horizon
H1_aligo_O4high_psd.xml.gz: aligo_O4high.txt
$(sexec) gstlal_psd_xml_from_asd_txt --instrument=H1 --output $@ $<
# 417.6 Mpc Horizon
L1_aligo_O4high_psd.xml.gz: aligo_O4high.txt
$(sexec) gstlal_psd_xml_from_asd_txt --instrument=L1 --output $@ $<
# 265.8 Mpc Horizon
V1_avirgo_O4high_NEW_psd.xml.gz: avirgo_O4high_NEW.txt
$(sexec) gstlal_psd_xml_from_asd_txt --instrument=V1 --output $@ $<
# 6.16 Mpc Horizon
K1_kagra_3Mpc_psd.xml.gz: kagra_3Mpc.txt
$(sexec) gstlal_psd_xml_from_asd_txt --instrument=K1 --output $@ $<
O4_projected_psds.xml.gz: H1_aligo_O4high_psd.xml.gz L1_aligo_O4high_psd.xml.gz V1_avirgo_O4high_NEW_psd.xml.gz K1_kagra_3Mpc_psd.xml.gz
$(sexec) ligolw_add --output $@ $^
# SNR ratios according to horizon ratios
dt_dphi.dag: O4_projected_psds.xml.gz
$(sexec) gstlal_inspiral_create_dt_dphi_snr_ratio_pdfs_dag \
--psd-xml $< \
--H-snr 8.00 \
--L-snr 8.00 \
--V-snr 5.09 \
--K-snr 0.12 \
--m1 1.4 \
--m2 1.4 \
--s1 0.0 \
--s2 0.0 \
--flow 15.0 \
--fhigh 1024.0 \
--NSIDE 16 \
--n-inc-angle 33 \
--n-pol-angle 33 \
--singularity-image $(SINGULARITY_IMAGE)
clean:
rm -rf H1_aligo_O4high_psd.xml.gz L1_aligo_O4high_psd.xml.gz V1_avirgo_O4high_NEW_psd.xml.gz logs dt_dphi.dag gstlal_inspiral_compute_dtdphideff_cov_matrix.sub gstlal_inspiral_create_dt_dphi_snr_ratio_pdfs.sub gstlal_inspiral_add_dt_dphi_snr_ratio_pdfs.sub dt_dphi.sh
3. Submit the HTCondor DAG and monitor the output
""""""""""""""""""""""""""""""""""""""""""""""""""
Next run make to generate the HTCondor DAG
.. code:: bash
$ make
Then submit the DAG
.. code:: bash
$ condor_submit_dag dt_dphi.dag
You can check the DAG progress by doing
.. code:: bash
$ tail -f dt_dphi.dag.dagman.out
4. Test the output
"""""""""""""""""""
When the DAG completes successfully, you should have a file called ``inspiral_dtdphi_pdf.h5``. You can verify that this file works with a python terminal, e.g.,
.. code:: bash
$ singularity exec /ligo/home/ligo.org/chad.hanna/development/gstlal-dev/ python3
Python 3.6.8 (default, Nov 10 2020, 07:30:01)
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from gstlal.stats.inspiral_extrinsics import InspiralExtrinsics
>>> IE = InspiralExtrinsics(filename='inspiral_dtdphi_pdf.h5')
>>>
Setting up probability of instrument combinations dag
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1. Setup a work area and obtain the necessary input files
"""""""""""""""""""""""""""""""""""""""""""""""""""""""""
You will need to create a directory on a cluster running HTCondor, e.g.,
.. code:: bash
$ mkdir p_of_instruments
$ cd p_of_instruments
2. Excecute commands to generate the HTCondorDAG
"""""""""""""""""""""""""""""""""""""""""""""""""
Below is a sample Makefile that will work if you are using singularity
3. Submit the HTCondor DAG
"""""""""""""""""""""""""""
.. code:: bash
$ condor_submit_dag p_of_I_H1K1L1V1.dag
.. code:: make
SINGULARITY_IMAGE=/ligo/home/ligo.org/chad.hanna/development/gstlal-dev/
sexec=singularity exec $(SINGULARITY_IMAGE)
all:
$(sexec) gstlal_inspiral_create_p_of_ifos_given_horizon_dag --instrument=H1 --instrument=L1 --instrument=V1 --instrument=K1 --singularity-image $(SINGULARITY_IMAGE)
clean:
rm -rf gstlal_inspiral_add_p_of_ifos_given_horizon.sub gstlal_inspiral_create_p_of_ifos_given_horizon.sub logs p_of_I_H1K1L1V1.dag p_of_I_H1K1L1V1.sh
See Also
^^^^^^^^
* https://arxiv.org/abs/1901.02227
* https://lscsoft.docs.ligo.org/gstlal/gstlal-inspiral/python-modules/stats.inspiral_extrinsics.html
* https://lscsoft.docs.ligo.org/gstlal/gstlal-inspiral/bin/gstlal_inspiral_create_dt_dphi_snr_ratio_pdfs.html
* https://lscsoft.docs.ligo.org/gstlal/gstlal-inspiral/bin/gstlal_inspiral_create_dt_dphi_snr_ratio_pdfs_dag.html
* https://lscsoft.docs.ligo.org/gstlal/gstlal-inspiral/bin/gstlal_inspiral_compute_dtdphideff_cov_matrix.html
####################################################################################################
Overview
####################################################################################################
.. _burst-overview-feature_extraction:
.. _feature_extraction:
Feature Extraction
====================================================================================================
The `fxtools` module and related feature-based executables contain relevant libraries to identify
glitches in low-latency using auxiliary channel data.
SNAX (Signal-based Noise Acquisition and eXtraction), the `snax` module and related SNAX executables
contain relevant libraries to identify glitches in low-latency using auxiliary channel data.
`gstlal_feature_extractor` functions as a modeled search for data quality by applying matched filtering
SNAX functions as a modeled search for data quality by applying matched filtering
on auxiliary channel timeseries using waveforms that model a large number of glitch classes. Its primary
purpose is to whiten incoming auxiliary channels and extract relevant features in low-latency.
There are two different modes of output `gstlal_feature_extractor` can function in:
.. _feature_extraction-intro:
Introduction
------------
There are two different modes of feature generation:
1. **Timeseries:**
1. **Timeseries:** Production of regularly-spaced feature rows, containing the SNR, waveform parameters,
and the time of the loudest event in a sampling time interval.
2. **ETG:** This produces output that resembles that of a traditional event trigger generator (ETG), in
which only feature rows above an SNR threshold will be produced.
Production of regularly-spaced feature rows, containing the SNR, waveform parameters,
and the time of the loudest event in a sampling time interval.
2. **ETG:**
This produces output that resembles that of a traditional event trigger generator (ETG), in
which only feature rows above an SNR threshold will be produced.
One useful feature in using a matched filter approach to detect glitches is the ability to switch between
different glitch templates or generate a heterogeneous bank of templates.. Currently, there are Sine-Gaussian
and half-Sine-Gaussian waveforms implemented for use in detecting glitches, but the feature extractor was
designed to be fairly modular and so it isn't difficult to design and add new waveforms for use.
different glitch templates or generate a heterogeneous bank of templates. Currently, there are Sine-Gaussian,
half-Sine-Gaussian, and tapered Sine-Gaussian waveforms implemented for use in detecting glitches, but the feature
extractor is designed to be fairly modular and so it isn't difficult to design and add new waveforms for use.
Since the GstLAL feature extractor uses time-domain convolution to matched filter auxiliary channel timeseries
Since SNAX uses time-domain convolution to matched filter auxiliary channel timeseries
with glitch waveforms, this allows latencies to be much lower than in traditional ETGs. The latency upon writing
features to disk are O(5 s) in the current layout when using waveforms where the peak occurs at the edge of the
template (zero-latency templates). Otherwise, there is extra latency incurred due to the non-causal nature of
......@@ -36,7 +42,7 @@ the waveform itself.
digraph llpipe {
labeljust = "r";
label="gstlal_feature_extractor"
label="gstlal_snax_extract"
rankdir=LR;
graph [fontname="Roman", fontsize=24];
edge [ fontname="Roman", fontsize=10 ];
......@@ -132,17 +138,19 @@ the waveform itself.
}
.. _feature_extraction-highlights:
**Highlights:**
Highlights
----------
* Launch feature extractor jobs in online or offline mode:
* Launch SNAX jobs in online or offline mode:
* Online: Using /shm or framexmit protocol
* Offline: Read frames off disk
* Online/Offline DAGs available for launching jobs.
* Offline DAG parallelizes by time, channels are processed sequentially by subsets to reduce I/O concurrency issues. There are options to allow flexibility in choosing this, however.
* Offline DAG parallelizes by time, channels are processed sequentially by subsets to reduce I/O concurrency issues.
* On-the-fly PSD generation (or take in a prespecified PSD)
......@@ -165,3 +173,73 @@ the waveform itself.
* Waveform type (currently Sine-Gaussian and half-Sine-Gaussian only)
* Specify parameter ranges (frequency, Q for Sine-Gaussian based)
* Min mismatch between templates
.. _feature_extraction-online:
Online Operation
----------------
An online DAG is provided in /gstlal-burst/share/snax/Makefile.gstlal_feature_extractor_online
in order to provide a convenient way to launch online feature extraction jobs as well as auxiliary jobs as
needed (synchronizer/hdf5 file sinks). A condensed list of instructions for use is also provided within the Makefile itself.
There are four separate modes that can be used to launch online jobs:
1. Auxiliary channel ingestion:
a. Reading from framexmit protocol (DATA_SOURCE=framexmit).
This mode is recommended when reading in live data from LHO/LLO.
b. Reading from shared memory (DATA_SOURCE=lvshm).
This mode is recommended for reading in data for O2 replay (e.g. UWM).
2. Data transfer of features:
a. Saving features directly to disk, e.g. no data transfer.
This will save features to disk directly from the feature extractor,
and saves features periodically via hdf5.
b. Transfer of features via Kafka topics.
This requires a Kafka/Zookeeper service to be running (can be existing LDG
or your own). Features get transferred via Kafka from the feature extractor,
parallel instances of the extractor get synchronized, and then sent downstream
where it can be read by other processes (e.g. iDQ). In addition, an streaming
hdf5 file sink is launched where it'll dump features periodically to disk.
In order to start up online runs, you'll need an installation of gstlal. An installation Makefile that
includes Kafka dependencies are located at: gstlal/gstlal-burst/share/feature_extractor/Makefile.gstlal_idq_icc
To run, making sure that the correct environment is sourced:
$ make -f Makefile.gstlal_feature_extractor_online
Then launch the DAG with:
$ condor_submit_dag feature_extractor_pipe.dag
.. _feature_extraction-offline:
Offline Operation
-----------------
An offline DAG is provided in /gstlal-burst/share/snax/Makefile.gstlal_feature_extractor_offline
in order to provide a convenient way to launch offline feature extraction jobs. A condensed list of
instructions for use is also provided within the Makefile itself.
For general use cases, the only configuration options that need to be changed are:
* User/Accounting tags: GROUP_USER, ACCOUNTING_TAG
* Analysis times: START, STOP
* Data ingestion: IFO, CHANNEL_LIST
* Waveform parameters: WAVEFORM, MISMATCH, QHIGH
In order to start up offline runs, you'll need an installation of gstlal. An installation Makefile that
includes Kafka dependencies are located at: gstlal/gstlal-burst/share/feature_extractor/Makefile.gstlal_idq_icc
To generate a DAG, making sure that the correct environment is sourced:
$ make -f Makefile.gstlal_feature_extractor_offline
Then launch the DAG with:
$ condor_submit_dag feature_extractor_pipe.dag
Getting started
===============
You can get a development copy of the gstlal software suite from git. Doing this at minimum will require a development copy of lalsuite.
* https://git.ligo.org/lscsoft/gstlal
* https://git.ligo.org/lscsoft/lalsuite
Source tarballs for GstLAL packages and all the LIGO/Virgo software dependencies are available here: http://software.ligo.org/lscsoft/source/
Limited binary packages are available here: https://wiki.ligo.org/Computing/DASWG/SoftwareDownloads
Building and installing from source follows the normal GNU build procedures
involving:
1. ./00init.sh
2. ./configure
3. make
4. make install.
You should build the packages in order of gstlal, gstlal-ugly,
gstlal-calibration, gstlal-inspiral. If you are building to a non FHS place
(e.g., your home directory) you will need to ensure some environment variables
are set so that your installation will function. The following five variables
must be set. As **just an example**::
GI_TYPELIB_PATH="/path/to/your/installation/lib/girepository-1.0:${GI_TYPELIB_PATH}"
GST_PLUGIN_PATH="/path/to/your/installation/lib/gstreamer-0.10:${GST_PLUGIN_PATH}"
PATH="/path/to/your/installation/bin:${PATH}"
# Debian systems need lib, RH systems need lib64, including both doesn't hurt
PKG_CONFIG_PATH="/path/to/your/installation/lib/pkgconfig:/path/to/your/installation/lib64/pkgconfig:${PKG_CONFIG_PATH}"
# Debian systems need lib, RH systems need lib and lib64
PYTHONPATH="/path/to/your/installation/lib64/python2.7/site-packages:/path/to/your/installation/lib/python2.7/site-packages:$PYTHONPATH"
GstLAL burst code
=================
.. toctree::
:maxdepth: 2
bin/bin
python-modules/modules
####################################################################################################
GstLAL burst
####################################################################################################
`GstLAL burst` contains several projects targeting a variety of different searches. These include:
* **Feature extraction:** Identify noise transient bursts (glitches) in auxiliary channel data.
* **Cosmic string search**
* **Excess power**
Contents
-------------------------
.. toctree::
:maxdepth: 2
overview
tutorials/tutorials
code
####################################################################################################
Running Offline Jobs
####################################################################################################
An offline DAG is provided in /gstlal-burst/share/feature_extractor/Makefile.gstlal_feature_extractor_offline
in order to provide a convenient way to launch offline feature extraction jobs. A condensed list of
instructions for use is also provided within the Makefile itself.
For general use cases, the only configuration options that need to be changed are:
* User/Accounting tags: GROUP_USER, ACCOUNTING_TAG
* Analysis times: START, STOP
* Data ingestion: IFO, CHANNEL_LIST
* Waveform parameters: WAVEFORM, MISMATCH, QHIGH
Launching DAGs
====================================================================================================
In order to start up offline runs, you'll need an installation of gstlal. An installation Makefile that
includes Kafka dependencies are located at: gstlal/gstlal-burst/share/feature_extractor/Makefile.gstlal_idq_icc
To generate a DAG, making sure that the correct environment is sourced:
$ make -f Makefile.gstlal_feature_extractor_offline
Then launch the DAG with:
$ condor_submit_dag feature_extractor_pipe.dag
Configuration options
====================================================================================================
Analysis times:
* START: set the analysis gps start time
* STOP: set the analysis gps stop time
Data ingestion:
* IFO: select the IFO for auxiliary channels to be ingested (H1/L1).
* CHANNEL_LIST: a list of channels for the feature extractor to process. Provided
lists for O1/O2 and H1/L1 lists are in gstlal/gstlal-burst/share/feature_extractor.
* MAX_SERIAL_STREAMS: Maximum # of streams that a single gstlal_feature_extractor job will
process at once. This is determined by sum_i(channel_i * # rates_i). Number of rates for a
given channels is determined by log2(max_rate/min_rate) + 1.
* MAX_PARALLEL_STREAMS: Maximum # of streams that a single job will run in the lifespan of a job.
This is distinct from serial streams since when a job is first launched, it will cache
auxiliary channel frames containing all channels that meet the criterion here, and then process
each channel subset sequentially determined by the serial streams. This is to save on input I/O.
* CONCURRENCY: determines the maximum # of concurrent reads from the same frame file. For most
purposes, it will be set to 1. Use this at your own risk.
Waveform parameters:
* WAVEFORM: type of waveform used to perform matched filtering (sine_gaussian/half_sine_gaussian).
* MISMATCH: maximum mismatch between templates (corresponding to Omicron's mismatch definition).
* QHIGH: maximum value of Q
Data transfer/saving:
* OUTPATH: directory in which to save features.
* SAVE_CADENCE: span of a typical dataset within an hdf5 file.
* PERSIST_CADENCE: span of a typical hdf5 file.
Setting the number of streams (ADVANCED USAGE)
====================================================================================================
NOTE: This won't have to be changed for almost all use cases, and the current configuration has been
optimized to aim for short run times.
Definition: Target number of streams (N_channels x N_rates_per_channel) that each cpu will process.
* if max_serial_streams > max_parallel_streams, all jobs will be parallelized by channel
* if max_parallel_streams > num_channels in channel list, all jobs will be processed serially,
with processing driven by max_serial_streams.
* any other combination will produce a mix of parallelization by channels and processing channels serially per job.
Playing around with combinations of MAX_SERIAL_STREAMS, MAX_PARALLEL_STREAMS, CONCURRENCY, will entirely
determine the structure of the offline DAG. Doing so will also change the memory usage for each job, and so you'll
need to tread lightly. Changing CONCURRENCY in particular may cause I/O locks due to jobs fighting to read from the same
frame file.
####################################################################################################
Running Online Jobs
####################################################################################################
An online DAG is provided in /gstlal-burst/share/feature_extractor/Makefile.gstlal_feature_extractor_online
in order to provide a convenient way to launch online feature extraction jobs as well as auxiliary jobs as
needed (synchronizer/hdf5 file sinks). A condensed list of instructions for use is also provided within the Makefile itself.
There are four separate modes that can be used to launch online jobs:
1. Auxiliary channel ingestion:
a. Reading from framexmit protocol (DATA_SOURCE=framexmit).
This mode is recommended when reading in live data from LHO/LLO.
b. Reading from shared memory (DATA_SOURCE=lvshm).
This mode is recommended for reading in data for O2 replay (e.g. UWM).
2. Data transfer of features:
a. Saving features directly to disk, e.g. no data transfer.
This will save features to disk directly from the feature extractor,
and saves features periodically via hdf5.
b. Transfer of features via Kafka topics.
This requires a Kafka/Zookeeper service to be running (can be existing LDG
or your own). Features get transferred via Kafka from the feature extractor,
parallel instances of the extractor get synchronized, and then sent downstream
where it can be read by other processes (e.g. iDQ). In addition, an streaming
hdf5 file sink is launched where it'll dump features periodically to disk.
Launching DAGs
====================================================================================================
In order to start up online runs, you'll need an installation of gstlal. An installation Makefile that
includes Kafka dependencies are located at: gstlal/gstlal-burst/share/feature_extractor/Makefile.gstlal_idq_icc
To run, making sure that the correct environment is sourced:
$ make -f Makefile.gstlal_feature_extractor_online
Then launch the DAG with:
$ condor_submit_dag feature_extractor_pipe.dag
Configuration options
====================================================================================================
General:
* TAG: sets the name used for logging purposes, Kafka topic naming, etc.
Data ingestion:
* IFO: select the IFO for auxiliary channels to be ingested.
* CHANNEL_LIST: a list of channels for the feature extractor to process. Provided
lists for O1/O2 and H1/L1 lists are in gstlal/gstlal-burst/share/feature_extractor.
* DATA_SOURCE: Protocol for reading in auxiliary channels (framexmit/lvshm).
* MAX_STREAMS: Maximum # of streams that a single gstlal_feature_extractor process will
process. This is determined by sum_i(channel_i * # rates_i). Number of rates for a
given channels is determined by log2(max_rate/min_rate) + 1.
Waveform parameters:
* WAVEFORM: type of waveform used to perform matched filtering (sine_gaussian/half_sine_gaussian).
* MISMATCH: maximum mismatch between templates (corresponding to Omicron's mismatch definition).
* QHIGH: maximum value of Q
Data transfer/saving:
* OUTPATH: directory in which to save features.
* SAVE_FORMAT: determines whether to transfer features downstream or save directly (kafka/hdf5).
* SAVE_CADENCE: span of a typical dataset within an hdf5 file.
* PERSIST_CADENCE: span of a typical hdf5 file.
Kafka options:
* KAFKA_TOPIC: basename of topic for features generated from feature_extractor
* KAFKA_SERVER: Kafka server address where Kafka is hosted. If features are run in same location,
as in condor's local universe, setting localhost:port is fine. Otherwise you'll need to determine
the IP address where your Kafka server is running (using 'ip addr show' or equivalent).
* KAFKA_GROUP: group for which Kafka producers for feature_extractor jobs report to.
Synchronizer/File sink options:
* PROCESSING_CADENCE: cadence at which incoming features are processed, so as to limit polling
of topics repeatedly, etc. Default value of 0.1s is fine.
* REQUEST_TIMEOUT: timeout for waiting for a single poll from a Kafka consumer.
* LATENCY_TIMEOUT: timeout for the feature synchronizer before older features are dropped. This
is to prevent a single feature extractor job from holding up the online pipeline. This will
also depend on the latency induced by the feature extractor, especially when using templates
that have latencies associated with them such as Sine-Gaussians.
####################################################################################################
Tutorials
####################################################################################################
.. toctree::
:maxdepth: 2
running_online_jobs
running_offline_jobs
GstLAL calibration code
=======================
.. toctree::
:maxdepth: 2
bin/bin
python-modules/modules
GstLAL calibration
==========================
.. toctree::
:maxdepth: 2
code