activate periodic holds for all jobs, not just OSG
Jobs can be told to evict themselves with robust checkpointing every N seconds by passing --max-runtime N
. The jobs resume after M seconds with --resume-time M
. The default values of N and M correspond to 23.5 hours and 5 minutes, respectively.
Note that caltech evicts jobs itself after a default of 4 hours.
This MR also tells post-processing jobs (including megaplot/sky) to only transfer data on exit, not evict. This is critical for saving disk space.