Skip to content

Fix os.mkdir race condition in main pipeline script

Daniel Tang requested to merge fix/os_mkdir_race_condition into spiir-O4-EW-development

This change addresses a rare race condition where multiple nodes try to make the same skymap folder at the same time, which can crash the nodes in rare circumstances due to race conditions. This MR closes #20 (closed) .

Code Logic

The logic is simple:

  • First we still check if the skymap_path directory exists (which is technically not necessary, but we keep the check anyway).
  • Then, we attempt to make the skymap path (where previously we simply made it without a try/except, causing a pipeline failure as the exception was not handled).
  • If an OSError is raised (specifically OSError 17 corresponding to a File already existing, i.e. e.errno == errno.EEXIST), then we know the skymap_path directory already exists and there is no need to create it.
  • If an OSError raised that doesn't have the correct error code, then we raise the error as it is not the specific exception we are trying to catch and handle.
  • Else if there is no OSError to handle, then the os.mkdir(skymap_path) command has executed as intended, so we can print to our stderr logs that the directory has successfully been created.

We've also changed the check on skymap_path to whether it "exists" to whether it is a "directory". Note that if for some reason the skymap_path is a file (for some reason), it will still try to create the directory and then error, which matches the previous logic, but it will raise a better OSError rather than skipping over it like before. I do not think we should get the pipeline to manually delete files if they are accidentally named as our skymap_path (which should not be happening anyway).

Example Python Code

Below is an example code snippet using this try/except/else logic for when a folder already exists:

image

Below is an example code snippet using this try/except/else logic for when a folder does not already exist:

image

Python 3 Change

When we port to Python3, this entire code block can simply just be replaced with os.mkdir(exist_ok=True), which we have noted in an inline comment.

If the skymap directory is a nested directory, we can alternatively use os.mkdir(exist_ok=True, parents=True) if required.

Edited by Daniel Tang

Merge request reports