Skip to content

Don't launch new subprocesses while old ones are in progress

With this change, when we go to launch calcfap or combine_stats, we should no longer block while waiting for the last one to finish.

Once an interval passes for calcfap or combine_stats, we check if the old processes are running. If they are, we simply don't run it this buffer. We'll keep checking each buffer, polling the old processes, until they're complete. Then we'll launch the processes and update the last run time.

This means we'll sometimes finish running calcfap at different times compared to before. It's an edge case for a live run, but will be common in an offline run. Since bg (unseeded) runs rely on calcfap, we might need to see how it impacts our 1-2w long tests. If it's significant, we could block just on offline tests?

Note that there's 2 cases being covered here:

  1. Our intervals are too short (for the hardware we're running on), and we simply can't run calcfap in the given time.
    1. This used to cause calcfap to block, effectively forcing us to be slower than realtime.
    2. This should now just mean that calcfap is constantly running, but won't keep up with the specified intervals.
  2. combine_stats launches just before calcfap
    1. This used to cause calfap to block
    2. We now carry on until combine_stats finishes, then launch calfap. With calcfap run every 30mins, and combine_stats every 24h, and combine_stats completing within minutes, it shouldn't be too significant.
  3. "What if combine_stats launches just after calcfap", I hear you cry. The old code would just launch it.
    1. I'm not actually sure what the conflict is between the two. combine_stats doesn't delete the old files, it just writes new ones. The old files are deleted when we next try to get bankstats files (see rm_fnames). I guess if we want to be sure that calcfap is being launched with the right files, it's easiest to wait for combine_stats to finish, delete old files, and just use the latest ones.

Note there's other changes planned to overhaul our snapshots & gracedb uploads, to make things async where possible using threads/processes. This should work independently of those changes.

Edited by Timothy Davies

Merge request reports