The Prior Transform¶
The nestle.sample
function takes two user-defined functions as
inputs. The first function should give the natural logarithm of the
likelihood function of interest. The second function is a “prior
transform.” What’s up with that?
The prior transform function is used to specify the Bayesian prior for the problem, in a round-about way. It is a transformation from a space where variables are independently and uniformly distributed between 0 and 1 to the parameter space of interest. For independent parameters, this would be the product of the inverse cumulative distribution function (also known as the percent point function or quantile function) for each parameter.
Uniform priors¶
This is easier to illustrate with a concrete example. Suppose we wanted a uniform prior in the range [-5, 5.) in all variables:
The prior transform would be:
def prior_transform(x):
return 10. * x - 5.
because if x
is an array of numbers between 0 and 1, the result
will be an array of numbers between -5 and 5.
For a slightly more complex example, suppose we have problem with two variables where the prior on the first variable is a uniform distribution between -1 and 1 and the prior on the second variable is a uniform distribution between -10 and 10. The prior transform would be:
def prior_transform(x):
return np.array([2. * x[0] - 1., 20. * x[1] - 10.])
or equivalently:
def prior_transform(x):
return np.array([2., 20.]) * x + np.array([-1., 10.])
Incidentally, the simplest possible prior to define would be a uniform
distribution between 0 and 1 in all parameters, for which the
transform would be lambda x: x
.
Non-uniform priors¶
Suppose we wish to specify a normal distribution for our prior, in two
dimensions. We can use the inverse cumulative distribution function
for the normal distribution, scipy.special.ndtri
:
from scipy.special import ndtri
prior_transform = ndtri
This specifies a normal distribution with mean 0 and standard deviation 1 in all dimensions. To specify a different mean and standard deviation, simply transform the output:
def prior_transform(x):
return mu + sigma * ndtri(x)
where mu
is an array giving the mean in each dimension and
sigma
is an array giving the standard deviation in each dimension.
The distributions in scipy.stats
(particularly the ppf
method)
might be particuarly useful in constructing more complex priors.
Why?¶
Why not combine the prior and likelihood in a single function (giving the posterior) as in, e.g., emcee?
Unlike traditional MCMC, nested sampling starts by randomly sampling
from the entire parameter space. This is not possible without
specifying some sort of contraints on the parameters. Thus the user
would still have to specify bounds on all the parameters. You could
use nestle.sample
this way: pass a “loglikelihood” function that in
fact returns the log of the posterior, and then specify a prior
transform that is uniform in some parameter range (as illustrated in
the first example). This will give valid samples. However, the
evidence will be affected by the range of the prior. A wider uniform
prior decreases the weight of the high-likelihood regions in the
evidence integral, leading to a lower evidence.
There are a couple advantages to specifying the prior as a transform rather than simply as bounds. First, it naturally allows priors that extend to infinity but have finite integrals, such as the normal distribution illustrated above. Second, assuming the prior transform is computationally cheap, this lets us cheaply draw samples according to the prior. We will (proportionally) avoid evaluating the likelihood in regions we know to have small prior values. (This preferential sampling is accounted for in the results.)