Skip to content
Snippets Groups Projects

activate periodic holds for all jobs, not just OSG

Jobs can be told to evict themselves with robust checkpointing every N seconds by passing --max-runtime N. The jobs resume after M seconds with --resume-time M. The default values of N and M correspond to 23.5 hours and 5 minutes, respectively.

Note that caltech evicts jobs itself after a default of 4 hours.

This MR also tells post-processing jobs (including megaplot/sky) to only transfer data on exit, not evict. This is critical for saving disk space.

Merge request reports

Loading
Loading

Activity

Filter activity
  • Approvals
  • Assignees & reviewers
  • Comments (from bots)
  • Comments (from users)
  • Commits & branches
  • Edits
  • Labels
  • Lock status
  • Mentions
  • Merge request status
  • Tracking
Please register or sign in to reply
Loading