Restarting on exits and evictions
NOTE: All these updates have to do with condor.
Now when BayesWave is restarted (via submitting a dag on condor), instead of starting from the very beginning, it will see whether there are checkpoints that BW should be starting from. To do this it looks at the directory that is available on the remote node (which is empty if BW exited and non-empty if BW was evicted) and the directory that is available locally (on CIT).
By looking inside of the checkpoint files, cp_files.py
decides which directory is more advanced. It runs on the more advanced one.
Then delete_corruption.py
deletes any excess lines in the chain files that might be there after BW exited with a signal
error.
This changes the default exit strategy (which was just to start from the beginning), and hopefully saves a ton of computation time.
Also this allows the user to resubmit dag files that might have failed for cluster errors instead of BayesWave errors.
Merge request reports
Activity
assigned to @sophie.hourihane and @marcella.wijngaarden
added 15 commits
- 28179442 - Updates to megaplot
- d7629c53 - Updates to BW_Flags so that spin is not assumed to be zero.
- fb858a31 - Adding multi_type noise residuals to megaplot
- ce2cc6d0 - Minor changes so that cleanOnly runs finish successfully
- 8b170c23 - Changing BW_Flags modelList from a tuple to a list
- 7d4d9b73 - 1) Added support for chirplets flag
- e956556d - 1) Changed amplitude to log amplitude in glitch and signal cornerplots
- 1294875c - 1) Moved Anderson Darling statistics to last diagnostic page
- 1b48572a - 1) Linking to CBC cornerplots correctly
- 112d926d - 1) Fixed bug that stopped logA from being plotted on glitch and signal plot
- ab68240c - Minor fixes
- 6e92ebc3 - Adding injection to cleaning phase plot
- 881ccac8 - Fixing cbc_cornerplot on cbc moments page
- e3bf5414 - Merge branch 'master_plot' into 'master'
- e4c5a7f5 - Merge branch 'master' of git.ligo.org:sophie.hourihane/bayeswave into CBC_master
Toggle commit listadded 1 commit
- 8e7454f2 - Fixing local_model variable that was out of bounds in residual plot names
added 6 commits
- 12d3716d - Adding files delete_corruption.py and cp_files.py
- 1bd527df - setupdirs.py when run as a condor precommand will now run cp_files.py and delete_corruption.py
- 23a19797 - Changed name of set_cache to set_cache_from_checkpoint_asd
- a17545ea - Making restarting from checkpointing by running cp_files.py and delete_corruption.py non default.
- 61e345ae - Fixing bayeswave_pipe --smart-restart so that it actually defaults to False
- b5e0ea4b - Merge branch 'checkpointing' into CBC_master
Toggle commit listadded 1 commit
- 3547cb1e - Added cbc_keys to BW_Flags and changed ecc to elip in signal parameters
added 1 commit
- ab59d416 - Moved setupdir.py output files to go in the .log directory
mentioned in commit d1dae4cf