Dynesty checkpointing (!74) · Merge requests · lscsoft / bilby

Colm Talbot requested to merge dynesty_checkpointing into master Jun 18, 2018

This implements basic checkpointing for dynesty allowing runs to be resumed if they're interrupted by, e.g., running out of memory on a cluster.

I'm not sure if this will completely resolve the issue we've seen with runs with many live points using a lot of memory, but at least it will be more stable to crashes.

Currently, the user specifies n_check_point and resume (defaults 1000000 and True) to say how many likelihood evaluations should be done between checkpoints and whether to restart from a saved state.

I've introduced two methods which I could be convinced to make private.

I haven't implemented this for the DynamicNestedSampler, I haven't actually used that sampler. If someone wants to generalise what I've done, hopefully it'll be trivial.

@paul-lasky @moritz.huebner @gregory.ashton

Dynesty checkpointing

Merge request reports