Skip to content

Change inspiral DAG architecture regarding expected SNR calc, simplify generation of workflows, add documentation

Patrick Godwin requested to merge feature-simplify_osg_workflow into master

This PR bridges the gap in having suitable documentation for users to use the new OSG workflow as well as self-containing (and simplifying) workflow generation so that Makefiles/etc. are contained within gstlal rather than relying on a separate repo (e.g. https://git.ligo.org/gstlal/osg-workflow).

Additionally, some extra work was done to change the DAG architecture regarding the expected SNR calculation so it's done as part of a separate process (as was done before), motivated by two points. (1) It was assumed that the expected SNR calculation would be a drop in the bucket in terms of processing in gstlal_inspiral compared to the filtering. That was the case for BNS and IMBH DAGs run before, but issues arose trying to run a full production DAG in the BBH region regarding having a higher sampling rate. Increasing the sampling rate to support this broke this assumption in that it takes a few minutes (rather than 10-20) seconds which is problematic. (2) This was designed before the inclusion of the analytic VT jobs which make use of the split injection/expected SNR workflow so retrofitting this was sub-optimal in this configuration.

Changes needed to support the expected SNR architecture change:

  • gstlal_inspiral: Remove on-the-fly expected SNR calculation but still trim injections for efficiency and reduced I/O footprint in trigger files.
  • gstlal_inspiral_expected_snr: Support passing in single PSD files in addition to only supporting PSD caches. The caches are incompatible with Condor file transfer and OSG support in general as there is a level of indirection to point at the files needed.
  • gstlal_inspiral_expected_snr: It was found that the f_low for calculating expected SNR was actually being grabbed from the wrong place, so use the right variable instead. See https://git.ligo.org/lscsoft/gstlal/-/blob/9777d42c54627a98a5e63a100ffeff31c38443d7/gstlal-inspiral/bin/gstlal_inspiral_injection_snr#L108-111.

Changes to support the simplified workflow generation:

  • Add suitable documentation in the CBC analysis section in the user guide for using the new workflow. See https://lscsoft.docs.ligo.org/-/gstlal/-/jobs/1640939/artifacts/docs/cbc_analysis.html for an example.
  • Embed the Makefile workflow logic within the repository itself with Makefile templates. This allows us to decouple the configuration with the workflow logic in a way that users can easily inspect the Makefiles that get generation. It also means that we don't need to carry around many Makefiles around that are very similar, where it's the case that fixes commonly get forgotten. The Makefile generation is defined within the gstlal.workflows module and relies on jinja2 for templating. It's a very light dependency (no sub-dependencies) with packages already available for SL7/Debian and used in the Flask library among others.
  • gstlal_inspiral_workflow: Add an init subcommand to generate a Makefile based on a configuration.

Other changes part of the OSG workflow when testing this on production-sized DAGs:

  • Fix issues with generating injection-only DAGs with the new DAG workflows.
  • Fix scalability issues with combining triggers/dist stats across bins due to "too many files".
  • Fix issue with number of SVD bins to process per inspiral job, where this was not calculated correctly causing much less SVD bins to be grouped together.
Edited by Patrick Godwin

Merge request reports