Skip to content

DAG Workflow Overhaul + OSG DAG support

Patrick Godwin requested to merge osg_dag_workflow into master

This merge request adds in everything needed to run GstLAL-based DAG workflows on the Open Science Grid (OSG). DAG generation has been entirely reworked to move away from pipeline.py to using the native htcondor.dags condor bindings.

All commonly used workflows have been ported over to use this new DAG generation with their own programs (gstlal_*_pipe -> gstlal_*_workflow), however, the legacy programs have been kept for backwards compatibility with a deprecation notice and will be removed in the future.

Motivation

  • Workflows could be more streamlined both as a user and as a maintainer (e.g. inspiral_pipe.py)
  • Move away from pipeline.py towards native HTCondor API
  • Want to run on OSG to take advantage of opportunistic resources
  • Current workflow not feasible for OSG submission
    • Not suited for file transfer (heavy I/O movement)
    • Not suited for job runtime targets (may span from O(10s) - O(day))
  • Redesign workflow to address both points + simplify/optimize where possible
  • Side benefits:
    • Configuration decoupled from workflow (e.g. Makefiles)
    • Data product discovery utilities

High level changes to existing programs (for OSG + new workflow generation)

  • gstlal_compute_far_from_snr_chisq_histograms
    • Add --output-background-bins-file instead of hard-coding output name
  • gstlal_inspiral
    • --svd-bank option:
      • Before: --svd-bank H1:H1-SVD.xml.gz,L1:L1-SVD.xml.gz
      • After: --svd-bank H1-SVD.xml.gz --svd-bank L1-SVD.xml.gz
    • Calculate expected SNR from injections as part of the job if a reference PSD is provided
  • Injection simplify / cluster SQL files
    • Instead of dropping sim_inspiral table, de-duplicate columns
  • gstlal_inspiral_plot_sensitivity, gstlal_inspiral_plot_background, gstlal_inspiral_plot_summary:
    • Use T050017 naming convention for generated output
    • Create output directory if it doesn’t exist
  • gstlal_inspiral_svd_checkerboard
    • Add --in-place option to modify SVD bank in place
  • datasource.py
    • Allow frame caches to be generated on-the-fly if a data find server URL and frame type is provided instead of a frame cache in GWDataSourceInfo
  • gstlal_inspiral_calc_rank_pdfs
    • Expose --num-cores to pass into RankingStatPDF
  • gstlal_inspiral_calc_likelihood
    • allow copying of triggers rather than doing this in place by specifying --copy as well as optionally --copy-dir to copy triggers to a new location. This is done to avoid an expensive copy step within the workflow in condor and with condor file transfer makes the workflow more efficient.
  • gstlal_injsplitter
    • create new tables for split injections rather than appending/re-using old tables. This avoids bloat in the split injection files.
    • standardize output name based on input injection file to allow better data discovery
  • gstlal_plot_psd_horizon
    • use an argument parser rather than using sys for the CLI.
    • Modify the command line to follow <psd_1> ... <psd_N> --output <out_psd> rather than <out_psd> <psd_1> ... <psd_N> to more closely match the other psd CLI tools.
  • gstlal_inspiral_make_mcvt_plot
    • allow unnamed positional files to always be lnlr_cdf files rather than it be dependent on command line options to avoid confusion, and also has the effect of allowing them to be passed in without use of a cache.
Edited by Patrick Godwin

Merge request reports