Standardize some common sampler functionality
This MR attempts to standardize a few common features we implement for external samplers.
I had hoped to keep them independent, but they became pretty intertwined, so I wanted to at least share them in this way.
- saving results to a temporary directory.
Several samplers (
pymultinest
,dnest4
,ultranest
) had pretty much equivalent definitions of a file transfer mechanism. I'm not entirely sure that they all need it. Originallypymultinest
needed it due to a limitation of the underlyingfortran
code. This way any changes/fixes to one will be applied to the others. - opening and closing a pool.
We manually set up/tear down the pool used for
dynesty
,ptemcee
, andptemcee
. This typically involves defining some global objects that are then stored and called independently in different threads. I wrapped this in a little class to reduce the number of globals we have floating around. This should again cut down on maintenance. A side effect of this is thatemcee
,kombine
, andzeus
now support parallelization using multiprocessing. - signal catching.
This hit most of the samplers as we set a catch for signals from the system. Currently, we just set the signals when the object is initialized and forget about it. This has lead to some weird behaviour if there is a significant amount of stuff that happens after the sampler finishes, e.g., reconstructing marginalized parameters. The new method just has a decorator that is applied to the
run_sampler
method if we want it that sets the signal before entering the method and resets them to the previous values when the sampling is done.
Edit (220126): Since this MR already essentially rewrote the entire directory I decided it was time to add this bilby.core.sampler
directory to the pre-commits. This involved running, black
, isort
, and flynt
.
Edit (220726): Six months since the last update... I made some slides to explain the different changes.
Merge request reports
Activity
added Refactoring Sampling labels
added 1 commit
- 528abd6b - Only test parallelisation with samplers that support it
mentioned in merge request !1044 (merged)
added 11 commits
-
d6ff1ac3...b8b4a781 - 10 commits from branch
master
- 78426198 - Merge remote-tracking branch 'origin' into standardize-pool-handling
-
d6ff1ac3...b8b4a781 - 10 commits from branch
mentioned in merge request !1015 (closed)
added 19 commits
-
78426198...6ab61c25 - 18 commits from branch
master
- 54a8fce2 - Merge branch 'master' into 'standardize-pool-handling'
-
78426198...6ab61c25 - 18 commits from branch
changed milestone to %1.2.0
added 6 commits
-
94f4a87a...8585cab9 - 2 commits from branch
master
- 2479d4d0 - Merge remote-tracking branch 'origin/master' into standardize-pool-handling
- e2c4a018 - Make ptemcee pickling work
- b80b7c32 - Make nessai exit on correct code
- 61084e07 - Add sampler hard exit option
Toggle commit list-
94f4a87a...8585cab9 - 2 commits from branch
added 15 commits
-
61084e07...6a2cf961 - 14 commits from branch
master
- fad9500c - Merge branch 'master' into 'standardize-pool-handling'
-
61084e07...6a2cf961 - 14 commits from branch
added 1 commit
- 7ec859eb - Make sure ptemcee variables are defined in init
added 15 commits
-
7ec859eb...a2390302 - 14 commits from branch
master
- de47eb9c - Merge branch 'master' into 'standardize-pool-handling'
-
7ec859eb...a2390302 - 14 commits from branch
added 10 commits
-
de47eb9c...e04d1190 - 9 commits from branch
master
- dbaa5ee6 - Merge branch 'master' into 'standardize-pool-handling'
-
de47eb9c...e04d1190 - 9 commits from branch
added 2 commits
added 17 commits
-
05218902...98d07fac - 14 commits from branch
master
- 359812f9 - Standardise checkpoint time for ptemcee
- a764c199 - Merge branch 'standardize-pool-handling' of git.ligo.org:lscsoft/bilby into...
- 76f896e8 - Merge remote-tracking branch 'origin/master' into standardize-pool-handling
Toggle commit list-
05218902...98d07fac - 14 commits from branch
mentioned in issue #624 (closed)
added 1 commit
- 795665f6 - add tests that npool argument works as expected
added 1 commit
- dc8f524b - remove erroneous emcee call and make sampler test safer
- Resolved by Colm Talbot
@michael.williams for some reason the nessai test here isn't finishing. I can run it fine locally (on a mac). Are you able to see if there's an issue with the changed implementation?
- Resolved by Colm Talbot
@colm.talbot I'm dug into this a bit more and I can't seem to replicate what's happening in the CI, the tests pass on my local machine and I added a similar test to nessai here to see if it passes, and it does. However, on another machine, I have identified some issues with how nessai interacts with the test that can cause it to fail:
-
torch
can be very slow to import on some systems, if this happens then the signal handling innessai
isn't set up by the time theSIGINT
is sent, so it isn't handled correctly. If the signal happens to be sent whilsttorch
is importing I get a really uninformative error and the test fails. I had something similar happen withbilby_mcmc
as well. Could something similar be happening with the signal handling? Like only being partially configured? - Even with
save=False
nessai will still try to checkpoint and save the sampler (though in hindsight maybe it shouldn't), so it ends up writing to disk. In some cases, I would get anOSError: Resource temporarily available
with the successive tests. I couldn't quite pin done why, but I think it happens because all of the nessai tests try to write to the same file? I guess this could be easily fixed using something like pytest'stmp_path
so each test has its own directory that automatically gets deleted after the test. -
npool
isn't correctly handled in the nessai interface, but all of the equivalent values are, becausenpool
is now a kwarg ofSampler
sonpool
is longer in the kwargs passed to_translate_kwargs
. This means the tests weren't actually using multiprocessing. I have a fix for this so I'll submit it as a suggested change (you also need to specify a maximum number of threads when using multiprocessing in nessai.)
Unfortunately, none of these really explain why the test is taking so long.
This won't fix the test but one option we could consider in the future is adding an option to nessai to disable signal handling, so we can handle it in bilby instead. What do you think?
-
- Automatically resolved by Colm Talbot
- Resolved by Colm Talbot
added 1 commit
- ed63cb7d - make sure sampling time updates without checkpointing
added 11 commits
-
5cd9a1be...ff6de208 - 10 commits from branch
master
- 4b6e4853 - Merge remote-tracking branch 'origin' into standardize-pool-handling
-
5cd9a1be...ff6de208 - 10 commits from branch
added 9 commits
-
4b6e4853...aca9ee12 - 8 commits from branch
master
- 86e279e4 - Merge remote-tracking branch 'origin/master' into standardize-pool-handling
-
4b6e4853...aca9ee12 - 8 commits from branch
added 2 commits
added 1 commit
- 9602813d - make sure schwimmbad is installed for sampler testing