Skip to content

Inconsistent saved `log_likelihood` values

@vivien pointed out that the log_likelihood saved in the result.posterior does not agree with the values that come out of plugging the posteriors samples into the likelihood object.

Minimal working example

#!/bin/python
"""
"""
from __future__ import division
import tupak
import numpy as np

data = np.random.normal(3, 4, 100)


class SimpleGaussianLikelihood(tupak.Likelihood):
    def __init__(self, data):
        """
        A very simple Gaussian likelihood

        Parameters
        ----------
        data: array_like
            The data to analyse
        """
        self.data = data
        self.N = len(data)
        self.parameters = {'mu': None}

    def log_likelihood(self):
        mu = self.parameters['mu']
        sigma = 1
        res = self.data - mu
        return -0.5 * (np.sum((res / sigma)**2) +
                       self.N * np.log(2 * np.pi * sigma**2))


likelihood = SimpleGaussianLikelihood(data)
priors = dict(mu=tupak.core.prior.Uniform(0, 5, 'mu'))
result = tupak.run_sampler(
    likelihood=likelihood, priors=priors, sampler='dynesty', npoints=100)

p = result.posterior.iloc[0]
print(p)
likelihood.parameters['mu'] = p['mu']
print('For mu = {}, likelihood={}'.format(likelihood.parameters['mu'],
                                          likelihood.log_likelihood()))
import IPython; IPython.embed() # Drops to the terminal for further debugging

Output

mu                   3.330169
log_likelihood   -1246.500093
Name: 0, dtype: float64
For mu = 3.33016914014, likelihood=-806.353091386

Probable cause

In the code here, https://git.ligo.org/Monash/tupak/blob/master/tupak/core/sampler.py#L595, we take the output of dynesty and resample them, returning an array of equal length to the input, but resampled of course. But, we then just save the computed likelihoods for the not-resampled points.

@colm.talbot , does this sounds like a good interpretation to you?

To solve this, I think we need to get the sort-index for the reweighted samples (I've never looked into the core of that function so it may not actually work like that).

The same problem is going to exist for nestle. Currently pymultinest and other samplers do not save the likelihood per-call.

I wonder if it's worth writing our own resampling function?