Skip to content

lalpulsar_MakeSFTs refactoring

Karl Wette requested to merge ANU-CGA/lalsuite:MakeSFTs-refactor into master

Description

Note: this MR requires !2027 (merged), so until !2027 (merged) is merged the diff for this MR will also show changes from !2027 (merged).

This MR refactors lalpulsar_MakeSFTs. Aim is to:

  • Remove deprecated options that are no longer needed.
  • Upgrade the code (which was primarily written 2005--2007) to use modern XLAL interfaces.
  • Implement the new public SFT naming convention using functions from !2007 (merged).
  • Hopefully simplify the code so that it is easier to understand, maintain, and potentially add new features in future.

Main changes:

  • The following options of lalpulsar_MakeSFTs (and the equivalent options to lalpulsar_MakeSFTDag) are no longer supported:
    • -u, --frame-struct-type, -H, --use-hot: no longer required; the modern XLAL frame reading interface determines the channel type automatically. (I believe this will also allow lalpulsar_MakeSFTs to run on Virgo frames, which wasn't the case before.)
    • -i, --ifo: no longer required; the detector prefix is deduced from the channel name (except that e.g. if the channel name is L0:... the detector prefix will be L1)
    • -D, --make-gps-dirs: no longer supported; lalpulsar_MakeSFTs now writes all SFTs into the same directory, leaving the organising into a directory structure to other scripts
    • -Z, --make-tmp-file: this is now the default behaviour; lalpulsar_MakeSFTs write each SFT to a file with extension .sft_TO_BE_VALIDATED, runs XLALCheckSFTFileIsValid() (which call the same code as lalpulsar_SFTvalidate) on the result, and if successful renames the SFT to a .sft extension
    • -X, --misc-desc: the public SFT filename specification doesn't support arbitrary comments added to SFT filenames
    • -v, --sft-version: SFTs are always written with the latest version of the specification (currently 3; !2027 (merged))
    • -S, --use-single: this option would process the SFT time series (filtering, windowing, FFTing) in single precision instead of double precision. It has never been used (AFAICT) for production SFTs. Removing this option greatly simplifies the code (where previously all operations were copy-n-pasted for both single and double precision.)
  • lalpulsar_MakeSFTs now uses the standard LAL window functions provided by lal/Window.h instead of its own implementations. This should allow better consistency with e.g. SFTs produced with MFDv5. The --window-type option no longer accepts integer options (0, 1, 2, 3) but must take a string window name (rectangular, tukey, hann). Passing the (0, 1, 2, 3) window options gives detailed error messages as to how the window options have changed (e.g. that the Matlab style Tukey window used previously is not identical to the lal/Window.h implementation.
  • lalpulsar_MakeSFTs now writes SFTs with the public filename convention documentation in the SFT specification. This requires 3 new options (which are also passed through from lalpulsar_MakeSFTDag):
    • -O OBSERVING_RUN, --observing-run OBSERVING_RUN: Observing run data the SFTs are generated from, or (in the case of mock data challenge data) the observing run on which the data is most closely based
    • (Note that the -O option to lalpulsar_MakeSFTs used to specify a log file directory. I've merged that option with -o, which also specifies a log file directory, as lalpulsar_MakeSFTs is running out of letters for options.)
    • -K {RUN,AUX,SIM,DEV}, --observing-kind {RUN,AUX,SIM,DEV}: RUN for production SFTs of h(t) channels; AUX for SFTs of non-h(t) channels; SIM for mock data challenge or other simulated data; or DEV for development/testing purposes
    • -V OBSERVING_VERSION, --observing-version OBSERVING_VERSION: Version number starts at 1, and should be incremented once SFTs have been widely distributed across clusters, advertised as being ready for use, etc. For example, if mistakes are found in the initial SFT production run after they have been published, regenerated SFTs should have a version number of at least 2.
  • Otherwise, lalpulsar_MakeSFTs retains the same options as before, and should generate the same output. This has been confirmed at least by testMakeSFTs.sh but may require further testing.

Notes:

  • Another benefit of porting to modern XLAL frame reading interface is that lalpulsar_MakeSFTs can now handle gaps in the frame data itself, rather than having to be given a set of contiguous frames. This feature currently isn't needed as lalpulsar_MakeSFTDag takes care of the gaps, but it could in future be used to simplify the SFT generation pipeline if desired; with lalpulsar_MakeSFTs doing the work of handling gaps itself, lalpulsar_MakeSFTDag could probably be simplified quite a bit.
  • In using the modern XLAL frame reading interface I've turned on checksum validation of input frames using the LAL_FR_STREAM_CHECKSUM_MODE mode. I'm not sure whether the old legacy LAL interface enabled the same option. In any case this should definitely address #404.
  • As noted above, lalpulsar_MakeSFTs now essentially runs lalpulsar_SFTvalidate on every SFT before it is renamed to its final filename. This should mean SFT production pipelines are less likely to write corrupted SFTs; if e.g. jobs in the HTCondor DAG crash, any partially-written SFTs will be clearly identifiable as having .sft_TO_BE_VALIDATED extensions.
  • MakeSFTs.c itself is reduced from ~2100 lines to ~500 lines without losing any essential functionality. With various deprecated/debugging code removed, readability is IMHO much approved. So hopefully this should main future maintenance and improvements a lot easier.

lalpulsar_CopySFTs

I've also written a new script lalpulsar_CopySFTs for copying SFTs between directories. This is similar in spirit to Greg's CopySFTs.tclsh script but is a complete rewrite in Python. The script does the following:

  • copies SFTs from a source directory (recursing into subdirectories) to a destination directory. The destination directory structure follows the SFT directory naming convention in the SFT specification.
  • each SFT is copied to a temporary .sft_TO_BE_VALIDATED extension, ValidateSFTFile() is run on each file, and finally the SFT moved to its final `.sft. extension
  • the script can parallelise SFT copying across multiple processes to speed up copying large numbers of SFTs

I have tested the script works correctly, but I haven't tested its performance on a large number of SFTs as yet.

lalpulsar_MakeSFTDag

I've also made a few changes to lalpulsar_MakeSFTDag:

  • The writeToDag() function now takes the user command-line options args as a variable, instead of passing in each command-line option individually, which makes it much easier to add/remove options
  • I've also updated writeToDag() to use modern Python f-strings for constructing the DAG output
  • The options lalpulsar_MakeSFTDag passes on to lalpulsar_MakeSFTs have been synchronised with the new/deprecated options to lalpulsar_MakeSFTs
  • The -O and -o options to lalpulsar_MakeSFTDag, which both specified log file directories with the same default ('logs') have been conflated into a single -o option, as I wanted to re-purpose the -O option for the observing run number for the public SFT filename spec.
  • The new -O, -K, and -V options for the public SFT filename spec. I've made these mandatory so that there aren't forgotten to be set correctly for h(t) SFT production. For non h(t) usage, e.g. for Fscans I suggest using -O 100 -K AUX -V 1, i.e. observing run number not relevant, "auxilliary" non h(t) data, version 1.

API Changes and Justification

Backwards Compatible Changes

  • This change does not modify any class/function/struct/type definitions in a public C header file or any Python class/function definitions
  • This change adds new classes/functions/structs/types to a public C header file or Python module

No library API changes, but users of lalpulsar_MakeSFTDag (or lalpulsar_MakeSFTs if anyone uses that code directly) will need to update their scripts.

Backwards Incompatible Changes

  • This change modifies an existing class/function/struct/type definition in a public C header file or Python module
  • This change removes an existing class/function/struct/type from a public C header file or Python module

Review Status

In addition to the CI builds I ran make && cd lalpulsar/bin/MakeData/ && make check after every commit in this MR to check for any intermediate breakages.

cc @david-keitel @evan-goetz @keith-riles

Merge request reports