DAG Workflow Overhaul + OSG DAG support
This merge request adds in everything needed to run GstLAL-based DAG workflows on the Open Science Grid (OSG). DAG generation has been entirely reworked to move away from pipeline.py to using the native htcondor.dags
condor bindings.
All commonly used workflows have been ported over to use this new DAG generation with their own programs (gstlal_*_pipe
-> gstlal_*_workflow
), however, the legacy programs have been kept for backwards compatibility with a deprecation notice and will be removed in the future.
Motivation
- Workflows could be more streamlined both as a user and as a maintainer (e.g. inspiral_pipe.py)
- Move away from pipeline.py towards native HTCondor API
- Want to run on OSG to take advantage of opportunistic resources
- Current workflow not feasible for OSG submission
- Not suited for file transfer (heavy I/O movement)
- Not suited for job runtime targets (may span from O(10s) - O(day))
- Redesign workflow to address both points + simplify/optimize where possible
- Side benefits:
- Configuration decoupled from workflow (e.g. Makefiles)
- Data product discovery utilities
High level changes to existing programs (for OSG + new workflow generation)
-
gstlal_compute_far_from_snr_chisq_histograms
- Add --output-background-bins-file instead of hard-coding output name
-
gstlal_inspiral
- --svd-bank option:
- Before: --svd-bank H1:H1-SVD.xml.gz,L1:L1-SVD.xml.gz
- After: --svd-bank H1-SVD.xml.gz --svd-bank L1-SVD.xml.gz
- Calculate expected SNR from injections as part of the job if a reference PSD is provided
- --svd-bank option:
- Injection simplify / cluster SQL files
- Instead of dropping sim_inspiral table, de-duplicate columns
-
gstlal_inspiral_plot_sensitivity
,gstlal_inspiral_plot_background
,gstlal_inspiral_plot_summary
:- Use T050017 naming convention for generated output
- Create output directory if it doesn’t exist
-
gstlal_inspiral_svd_checkerboard
- Add --in-place option to modify SVD bank in place
-
datasource.py
- Allow frame caches to be generated on-the-fly if a data find server URL and frame type is provided instead of a frame cache in GWDataSourceInfo
-
gstlal_inspiral_calc_rank_pdfs
- Expose --num-cores to pass into RankingStatPDF
-
gstlal_inspiral_calc_likelihood
- allow copying of triggers rather than doing this in place by specifying
--copy
as well as optionally--copy-dir
to copy triggers to a new location. This is done to avoid an expensive copy step within the workflow in condor and with condor file transfer makes the workflow more efficient.
- allow copying of triggers rather than doing this in place by specifying
-
gstlal_injsplitter
- create new tables for split injections rather than appending/re-using old tables. This avoids bloat in the split injection files.
- standardize output name based on input injection file to allow better data discovery
-
gstlal_plot_psd_horizon
- use an argument parser rather than using
sys
for the CLI. - Modify the command line to follow
<psd_1> ... <psd_N> --output <out_psd>
rather than<out_psd> <psd_1> ... <psd_N>
to more closely match the other psd CLI tools.
- use an argument parser rather than using
-
gstlal_inspiral_make_mcvt_plot
- allow unnamed positional files to always be lnlr_cdf files rather than it be dependent on command line options to avoid confusion, and also has the effect of allowing them to be passed in without use of a cache.
Edited by Patrick Godwin