Skip to content

Adding files delete_corruption.py and cp_files.py

Updates to how condor runs and restarts bayeswave. To implement these changes, add add --smart-restart to the bayeswave_pipe call.

Silly example:

bayeswave_pipe --trigger-time 1000 --smart-restart config.ini

Implementation Details

setupdirs.py in the condor pre-command now runs:

cp_files.py which copies the trigdir directory on the submit node (usually CIT) and compares that to the trigdir on the remote node. Depending on which one is further along (decided by looking at the checkpoint files, primarily temperature.dat and state.dat) it keeps the further along directory, and deletes the less progressed directory. Note, this will only work on machines with shared file systems. This will fail on the OSG.

After cp_files.py is run, delete_corruption.py is run which checks how long each trigdir/chains/MODEL_*.dat file should be, and crops them to their correct length (ie the length that the checkpoint says that it should be). This is necessary since when runs crash with signal errors, the printed lines are often corrupted.

These changes mean that on failed runs, bayeswave dag files can be resubmitted without starting from the very beginning.

Merge request reports