Make probability distributions portable across systems and checkpointable
Per !56 (merged), there are two features of the C++ probability distributions (std::uniform_real_distribution, std::uniform_int_distribution, std::normal_distribution) that complicate checkpointing.
- The spec does not guarantee that samples from the standard library's probability distributions will be the same across platforms. As a result, running BayesWave using the same seed on different machines might produce different results. (Of course, with sufficient iterations, BayesWave should converge to the same posterior on all machines, but having the same chain files across machines is still desirable.) For more information, see https://old.reddit.com/r/cpp/comments/7i21sn/til_uniform_int_distribution_is_not_portable/.
- The spec does not guarantee that the standard library's probability distributions are stateless (i.e., that the next sample from a distribution is a function only of the inputted random number generator). As a result, to completely restore a run from checkpointing, it is not sufficient to save and reload the random number generators' states; we must also save and reload the probability distributions' states. In execution, the std::uniform_*_distributions seem to be stateless (although the spec doesn't guarantee that they are), while std::normal_distribution has state.
To resolve the first issue, we should switch from the standard library's distributions to some other library's. Boost's distributions seem promising, with the added benefit that they have the same interface as the standard library's, which should make the switch trivial (just replace std:: with boost::).
Unfortunately, the second issue is not so simple. For performance reasons, many libraries (even GSL) store state inside of their normal distributions. Once we switch to a new library's probability distributions, we should implement a way to save and load those probability distributions so we can checkpoint. Note that unlike the standard library, Boost does guarantee that its uniform_*_distributions are stateless; we would only need to save out state for its normal_distribution.
-
Replace the standard library's probability distributions with another library's (perhaps Boost's) -
Implement saving and loading the state of a probability distribution