Pickle dump entire sampler in dynesty
We've noticed some pretty horrendous issues with restarting after a checkpoint recently.
I think that this is due to not saving all of the relevant state information.
This MR ensures the whole sampler will be saved.
I also took the liberty of adding two more plots.
- One is the run plot which dynesty produces, e.g.,
- The other is a little less pretty but shows the bound idx, number of likelihood calls and sampling scale as a function of the nested sampling iteration, e.g.,
The above plot shows the issue we had, the large spike followed by a higher nc steady state is when the run was interrupted and reloaded from the resume file (note that I stopped this run before it completely converged).
This is what that plot looks like with no interruption
This is what the plot looks like with the new checkpointing
Merge request reports
Activity
changed milestone to %0.6.6
added Bug High priority Sampling labels
The failure of the test seems to be related to https://www.gitmemory.com/issue/uqfoundation/dill/329/515638620.
The test passed as
$ python test/sampler_test.py
but not as
$ pytest test/sampler_test.py
Edited by Colm Talbotadded 1 commit
- 99dddb52 - Check the sampler is picklable before saving to make test run.
added 8 commits
-
99dddb52...2c7dd519 - 6 commits from branch
master
- 8c8a77bf - Merge remote-tracking branch 'origin' into improve-dynesty-checkpointing
- 823cd9e6 - Fix docstring
-
99dddb52...2c7dd519 - 6 commits from branch
- Resolved by Colm Talbot
@colm.talbot the lower plot looks great, is it checkpointing in there I guess?
I've started a PP test using this branch, I'll let you know how it fairs.
- Resolved by Colm Talbot
When the jobs first kick off I get this message
14:04 bilby INFO : Reading resume file outdir_pp_test_high_mass_dynesty_distance-phase-time/result/pp_test_high_mass_dynesty_distance-phase-time_data5_0_analysis_H1L1_dynesty_resume.pickle 14:04 bilby WARNING : Failed to read resume file outdir_pp_test_high_mass_dynesty_distance-phase-time/result/pp_test_high_mass_dynesty_distance-phase-time_data5_0_analysis_H1L1_dynesty_resume.pickle
I think it just needs a check that if the file doesn't exist.
- Resolved by Colm Talbot
@colm.talbot I stopped the jobs that I had running and resubmitted the dag, it seemed to fail to read in:
- Resolved by Gregory Ashton
@colm.talbot I ran it locally and hit a
KerError
, adding this line resolved itif "external_sampler" in state: del state['external_sampler']
@colm.talbot do you also think it is worth wrapping the "iteration" plot in a try except?