lalpulsar_MakeSFTs refactoring
Description
Note: this MR requires !2027 (merged), so until !2027 (merged) is merged the diff for this MR will also show changes from !2027 (merged).
This MR refactors lalpulsar_MakeSFTs. Aim is to:
- Remove deprecated options that are no longer needed.
- Upgrade the code (which was primarily written 2005--2007) to use modern XLAL interfaces.
- Implement the new public SFT naming convention using functions from !2007 (merged).
- Hopefully simplify the code so that it is easier to understand, maintain, and potentially add new features in future.
Main changes:
- The following options of
lalpulsar_MakeSFTs(and the equivalent options tolalpulsar_MakeSFTDag) are no longer supported:-
-u, --frame-struct-type,-H, --use-hot: no longer required; the modern XLAL frame reading interface determines the channel type automatically. (I believe this will also allowlalpulsar_MakeSFTsto run on Virgo frames, which wasn't the case before.) -
-i, --ifo: no longer required; the detector prefix is deduced from the channel name (except that e.g. if the channel name isL0:...the detector prefix will beL1) -
-D, --make-gps-dirs: no longer supported;lalpulsar_MakeSFTsnow writes all SFTs into the same directory, leaving the organising into a directory structure to other scripts -
-Z, --make-tmp-file: this is now the default behaviour;lalpulsar_MakeSFTswrite each SFT to a file with extension.sft_TO_BE_VALIDATED, runsXLALCheckSFTFileIsValid()(which call the same code aslalpulsar_SFTvalidate) on the result, and if successful renames the SFT to a.sftextension -
-X, --misc-desc: the public SFT filename specification doesn't support arbitrary comments added to SFT filenames -
-v, --sft-version: SFTs are always written with the latest version of the specification (currently 3; !2027 (merged)) -
-S, --use-single: this option would process the SFT time series (filtering, windowing, FFTing) in single precision instead of double precision. It has never been used (AFAICT) for production SFTs. Removing this option greatly simplifies the code (where previously all operations were copy-n-pasted for both single and double precision.)
-
-
lalpulsar_MakeSFTsnow uses the standard LAL window functions provided bylal/Window.hinstead of its own implementations. This should allow better consistency with e.g. SFTs produced with MFDv5. The--window-typeoption no longer accepts integer options (0,1,2,3) but must take a string window name (rectangular,tukey,hann). Passing the (0,1,2,3) window options gives detailed error messages as to how the window options have changed (e.g. that the Matlab style Tukey window used previously is not identical to thelal/Window.himplementation. -
lalpulsar_MakeSFTsnow writes SFTs with the public filename convention documentation in the SFT specification. This requires 3 new options (which are also passed through fromlalpulsar_MakeSFTDag):-
-O OBSERVING_RUN, --observing-run OBSERVING_RUN: Observing run data the SFTs are generated from, or (in the case of mock data challenge data) the observing run on which the data is most closely based - (Note that the
-Ooption tolalpulsar_MakeSFTsused to specify a log file directory. I've merged that option with-o, which also specifies a log file directory, aslalpulsar_MakeSFTsis running out of letters for options.) -
-K {RUN,AUX,SIM,DEV}, --observing-kind {RUN,AUX,SIM,DEV}:RUNfor production SFTs of h(t) channels;AUXfor SFTs of non-h(t) channels;SIMfor mock data challenge or other simulated data; orDEVfor development/testing purposes -
-V OBSERVING_VERSION, --observing-version OBSERVING_VERSION: Version number starts at 1, and should be incremented once SFTs have been widely distributed across clusters, advertised as being ready for use, etc. For example, if mistakes are found in the initial SFT production run after they have been published, regenerated SFTs should have a version number of at least 2.
-
- Otherwise,
lalpulsar_MakeSFTsretains the same options as before, and should generate the same output. This has been confirmed at least bytestMakeSFTs.shbut may require further testing.
Notes:
- Another benefit of porting to modern XLAL frame reading interface is that
lalpulsar_MakeSFTscan now handle gaps in the frame data itself, rather than having to be given a set of contiguous frames. This feature currently isn't needed aslalpulsar_MakeSFTDagtakes care of the gaps, but it could in future be used to simplify the SFT generation pipeline if desired; withlalpulsar_MakeSFTsdoing the work of handling gaps itself,lalpulsar_MakeSFTDagcould probably be simplified quite a bit. - In using the modern XLAL frame reading interface I've turned on checksum validation of input frames using the
LAL_FR_STREAM_CHECKSUM_MODEmode. I'm not sure whether the old legacy LAL interface enabled the same option. In any case this should definitely address #404. - As noted above,
lalpulsar_MakeSFTsnow essentially runslalpulsar_SFTvalidateon every SFT before it is renamed to its final filename. This should mean SFT production pipelines are less likely to write corrupted SFTs; if e.g. jobs in the HTCondor DAG crash, any partially-written SFTs will be clearly identifiable as having.sft_TO_BE_VALIDATEDextensions. -
MakeSFTs.citself is reduced from ~2100 lines to ~500 lines without losing any essential functionality. With various deprecated/debugging code removed, readability is IMHO much approved. So hopefully this should main future maintenance and improvements a lot easier.
lalpulsar_CopySFTs
I've also written a new script lalpulsar_CopySFTs for copying SFTs between directories. This is similar in spirit to Greg's CopySFTs.tclsh script but is a complete rewrite in Python. The script does the following:
- copies SFTs from a source directory (recursing into subdirectories) to a destination directory. The destination directory structure follows the SFT directory naming convention in the SFT specification.
- each SFT is copied to a temporary
.sft_TO_BE_VALIDATEDextension,ValidateSFTFile()is run on each file, and finally the SFT moved to its final `.sft. extension - the script can parallelise SFT copying across multiple processes to speed up copying large numbers of SFTs
I have tested the script works correctly, but I haven't tested its performance on a large number of SFTs as yet.
lalpulsar_MakeSFTDag
I've also made a few changes to lalpulsar_MakeSFTDag:
- The
writeToDag()function now takes the user command-line optionsargsas a variable, instead of passing in each command-line option individually, which makes it much easier to add/remove options - I've also updated
writeToDag()to use modern Python f-strings for constructing the DAG output - The options
lalpulsar_MakeSFTDagpasses on tolalpulsar_MakeSFTshave been synchronised with the new/deprecated options tolalpulsar_MakeSFTs - The
-Oand-ooptions tolalpulsar_MakeSFTDag, which both specified log file directories with the same default ('logs') have been conflated into a single-ooption, as I wanted to re-purpose the-Ooption for the observing run number for the public SFT filename spec. - The new
-O,-K, and-Voptions for the public SFT filename spec. I've made these mandatory so that there aren't forgotten to be set correctly for h(t) SFT production. For non h(t) usage, e.g. for Fscans I suggest using-O 100 -K AUX -V 1, i.e. observing run number not relevant, "auxilliary" non h(t) data, version 1.
API Changes and Justification
Backwards Compatible Changes
-
This change does not modify any class/function/struct/type definitions in a public C header file or any Python class/function definitions -
This change adds new classes/functions/structs/types to a public C header file or Python module
No library API changes, but users of lalpulsar_MakeSFTDag (or lalpulsar_MakeSFTs if anyone uses that code directly) will need to update their scripts.
Backwards Incompatible Changes
-
This change modifies an existing class/function/struct/type definition in a public C header file or Python module -
This change removes an existing class/function/struct/type from a public C header file or Python module
Review Status
In addition to the CI builds I ran make && cd lalpulsar/bin/MakeData/ && make check after every commit in this MR to check for any intermediate breakages.