lalpulsar_MakeSFTs refactoring
Description
Note: this MR requires !2027 (merged), so until !2027 (merged) is merged the diff for this MR will also show changes from !2027 (merged).
This MR refactors lalpulsar_MakeSFTs
. Aim is to:
- Remove deprecated options that are no longer needed.
- Upgrade the code (which was primarily written 2005--2007) to use modern XLAL interfaces.
- Implement the new public SFT naming convention using functions from !2007 (merged).
- Hopefully simplify the code so that it is easier to understand, maintain, and potentially add new features in future.
Main changes:
- The following options of
lalpulsar_MakeSFTs
(and the equivalent options tolalpulsar_MakeSFTDag
) are no longer supported:-
-u, --frame-struct-type
,-H, --use-hot
: no longer required; the modern XLAL frame reading interface determines the channel type automatically. (I believe this will also allowlalpulsar_MakeSFTs
to run on Virgo frames, which wasn't the case before.) -
-i, --ifo
: no longer required; the detector prefix is deduced from the channel name (except that e.g. if the channel name isL0:...
the detector prefix will beL1
) -
-D, --make-gps-dirs
: no longer supported;lalpulsar_MakeSFTs
now writes all SFTs into the same directory, leaving the organising into a directory structure to other scripts -
-Z, --make-tmp-file
: this is now the default behaviour;lalpulsar_MakeSFTs
write each SFT to a file with extension.sft_TO_BE_VALIDATED
, runsXLALCheckSFTFileIsValid()
(which call the same code aslalpulsar_SFTvalidate
) on the result, and if successful renames the SFT to a.sft
extension -
-X, --misc-desc
: the public SFT filename specification doesn't support arbitrary comments added to SFT filenames -
-v, --sft-version
: SFTs are always written with the latest version of the specification (currently 3; !2027 (merged)) -
-S, --use-single
: this option would process the SFT time series (filtering, windowing, FFTing) in single precision instead of double precision. It has never been used (AFAICT) for production SFTs. Removing this option greatly simplifies the code (where previously all operations were copy-n-pasted for both single and double precision.)
-
-
lalpulsar_MakeSFTs
now uses the standard LAL window functions provided bylal/Window.h
instead of its own implementations. This should allow better consistency with e.g. SFTs produced with MFDv5. The--window-type
option no longer accepts integer options (0
,1
,2
,3
) but must take a string window name (rectangular
,tukey
,hann
). Passing the (0
,1
,2
,3
) window options gives detailed error messages as to how the window options have changed (e.g. that the Matlab style Tukey window used previously is not identical to thelal/Window.h
implementation. -
lalpulsar_MakeSFTs
now writes SFTs with the public filename convention documentation in the SFT specification. This requires 3 new options (which are also passed through fromlalpulsar_MakeSFTDag
):-
-O OBSERVING_RUN, --observing-run OBSERVING_RUN
: Observing run data the SFTs are generated from, or (in the case of mock data challenge data) the observing run on which the data is most closely based - (Note that the
-O
option tolalpulsar_MakeSFTs
used to specify a log file directory. I've merged that option with-o
, which also specifies a log file directory, aslalpulsar_MakeSFTs
is running out of letters for options.) -
-K {RUN,AUX,SIM,DEV}, --observing-kind {RUN,AUX,SIM,DEV}
:RUN
for production SFTs of h(t) channels;AUX
for SFTs of non-h(t) channels;SIM
for mock data challenge or other simulated data; orDEV
for development/testing purposes -
-V OBSERVING_VERSION, --observing-version OBSERVING_VERSION
: Version number starts at 1, and should be incremented once SFTs have been widely distributed across clusters, advertised as being ready for use, etc. For example, if mistakes are found in the initial SFT production run after they have been published, regenerated SFTs should have a version number of at least 2.
-
- Otherwise,
lalpulsar_MakeSFTs
retains the same options as before, and should generate the same output. This has been confirmed at least bytestMakeSFTs.sh
but may require further testing.
Notes:
- Another benefit of porting to modern XLAL frame reading interface is that
lalpulsar_MakeSFTs
can now handle gaps in the frame data itself, rather than having to be given a set of contiguous frames. This feature currently isn't needed aslalpulsar_MakeSFTDag
takes care of the gaps, but it could in future be used to simplify the SFT generation pipeline if desired; withlalpulsar_MakeSFTs
doing the work of handling gaps itself,lalpulsar_MakeSFTDag
could probably be simplified quite a bit. - In using the modern XLAL frame reading interface I've turned on checksum validation of input frames using the
LAL_FR_STREAM_CHECKSUM_MODE
mode. I'm not sure whether the old legacy LAL interface enabled the same option. In any case this should definitely address #404. - As noted above,
lalpulsar_MakeSFTs
now essentially runslalpulsar_SFTvalidate
on every SFT before it is renamed to its final filename. This should mean SFT production pipelines are less likely to write corrupted SFTs; if e.g. jobs in the HTCondor DAG crash, any partially-written SFTs will be clearly identifiable as having.sft_TO_BE_VALIDATED
extensions. -
MakeSFTs.c
itself is reduced from ~2100 lines to ~500 lines without losing any essential functionality. With various deprecated/debugging code removed, readability is IMHO much approved. So hopefully this should main future maintenance and improvements a lot easier.
lalpulsar_CopySFTs
I've also written a new script lalpulsar_CopySFTs
for copying SFTs between directories. This is similar in spirit to Greg's CopySFTs.tclsh
script but is a complete rewrite in Python. The script does the following:
- copies SFTs from a source directory (recursing into subdirectories) to a destination directory. The destination directory structure follows the SFT directory naming convention in the SFT specification.
- each SFT is copied to a temporary
.sft_TO_BE_VALIDATED
extension,ValidateSFTFile()
is run on each file, and finally the SFT moved to its final `.sft. extension - the script can parallelise SFT copying across multiple processes to speed up copying large numbers of SFTs
I have tested the script works correctly, but I haven't tested its performance on a large number of SFTs as yet.
lalpulsar_MakeSFTDag
I've also made a few changes to lalpulsar_MakeSFTDag
:
- The
writeToDag()
function now takes the user command-line optionsargs
as a variable, instead of passing in each command-line option individually, which makes it much easier to add/remove options - I've also updated
writeToDag()
to use modern Python f-strings for constructing the DAG output - The options
lalpulsar_MakeSFTDag
passes on tolalpulsar_MakeSFTs
have been synchronised with the new/deprecated options tolalpulsar_MakeSFTs
- The
-O
and-o
options tolalpulsar_MakeSFTDag
, which both specified log file directories with the same default ('logs'
) have been conflated into a single-o
option, as I wanted to re-purpose the-O
option for the observing run number for the public SFT filename spec. - The new
-O
,-K
, and-V
options for the public SFT filename spec. I've made these mandatory so that there aren't forgotten to be set correctly for h(t) SFT production. For non h(t) usage, e.g. for Fscans I suggest using-O 100 -K AUX -V 1
, i.e. observing run number not relevant, "auxilliary" non h(t) data, version 1.
API Changes and Justification
Backwards Compatible Changes
-
This change does not modify any class/function/struct/type definitions in a public C header file or any Python class/function definitions -
This change adds new classes/functions/structs/types to a public C header file or Python module
No library API changes, but users of lalpulsar_MakeSFTDag
(or lalpulsar_MakeSFTs
if anyone uses that code directly) will need to update their scripts.
Backwards Incompatible Changes
-
This change modifies an existing class/function/struct/type definition in a public C header file or Python module -
This change removes an existing class/function/struct/type from a public C header file or Python module
Review Status
In addition to the CI builds I ran make && cd lalpulsar/bin/MakeData/ && make check
after every commit in this MR to check for any intermediate breakages.