Allow "refresh" of jobs
Some running analyses with a complex dag can become stuck in a way which condor does not mark as stuck, and the dagman
process appears to hang.
The tell-tale sign of this using a conventional workflow is when the dagman job has no running or idle jobs, and has an id of 0 when running condor_q
.
This behaviour has been noticed several times with RIFT, which is the most complex DAG we've worked with in asimov jobs.
One option is to try and identify and correct this behaviour in the asimov monitor
loop, but a lighter alternative is to have a straightforward way of condor_rm
ing the dagman job, and resubmitting the rescue dag, which is the standard fix.