Skip to content

Allow pickling with dill to enable use of emcee 'pool' option

Matthew Pitkin requested to merge matthew-pitkin/bilby:pickling into master

Currently, if you try passing the thread keyword argument to the emcee sampler with anything other than 1, it will fail. This is because, if wanting to use multiple threads, emcee will use the multiprocessing module's Pool. This attempts to Pickle the object being passed around, and bilby's sampler objects are not picklable.

One could instead use the pool keyword argument to directly pass emcee a Pool of workers to use that the user defines (for example it could be a pool of MPI workers). If you want to just use standard mutlicore processing you can use the pool from the multiprocess module, which is a fork of multiprocessing that instead uses dill rather than pickle (where dill is able to "pickle" more types of object).

However, even with the Pool object from multiprocess it will fail as dill tries to "pickle" the pool itself, which it can't do! This PR adds a __getstate__ attribute that removes the pool before pickling and allows this to succeed.

An example of using a multiprocess pool (assuming multiprocess has been installed via, e.g. pip), once this PR has been applied is:

from __future__ import division
import bilby
import numpy as np
import matplotlib.pyplot as plt

from multiprocess import Pool

# A few simple setup steps
label = 'linear_regression_emcee'
outdir = 'outdir'
bilby.utils.check_directory_exists_and_if_not_mkdir(outdir)

# First, we define our "signal model", in this case a simple linear function
def model(time, m, c):
    return time * m + c

# Now we define the injection parameters which we make simulated data with
injection_parameters = dict(m=0.5, c=0.2)

sampling_frequency = 10
time_duration = 10
time = np.arange(0, time_duration, 1 / sampling_frequency)
N = len(time)
sigma = np.random.normal(1, 0.01, N)
data = model(time, **injection_parameters) + np.random.normal(0, sigma, N)

# Now lets instantiate a version of our GaussianLikelihood, giving it
# the time, data and signal model
likelihood = bilby.likelihood.GaussianLikelihood(time, data, model, sigma)

# From hereon, the syntax is exactly equivalent to other bilby examples
# We make a prior
priors = dict()
priors['m'] = bilby.core.prior.Uniform(0, 5, 'm')
priors['c'] = bilby.core.prior.Uniform(-2, 2, 'c')

pool = Pool(2)

# And run sampler
result = bilby.run_sampler(
    likelihood=likelihood, priors=priors, sampler='emcee', nburn=100, nsteps=200,
    pool=pool, injection_parameters=injection_parameters, outdir=outdir,
    label=label)

cc @jethro.linley this might be of interest to you.

Merge request reports