Resume from checkpoint not working
When a run either fails or gets condor_rm'ed, if I resubmit the dag, it tries to read the checkpoint pickle file but says it is corrupted. When I check when the file was last updated, it is clear that condor did not transfer the file back when the original run died. I think there is an issue transferring the pickle files back when the run dies (this is on CIT).