using Distributions
using Plots
using Random
using StatsPlots
using StatsBase
Random.seed!(0408)
#vector of potential pi values
= 0:0.1:1
pi_grid
#prior beta distribution
= Beta(2, 2) d_beta
Beta{Float64}(α=2.0, β=2.0)
Often, particularly when we have lots of parameters to estimate, it’s computationally prohibitive (if not actually impossible) to try to specify the posterior distribution. In these cases, we need to approximate the posterior via simulation.
There are many different approaches for, including grid approximation, quadtratic approximation, and various flavors of Markov Chain Monte Carlo (MCMC) algorithms (e.g. Metropolis, Gibbs, Hamiltonian).
Imagine we have the following beta-binomial model:
\(Y|\pi \sim Bin(10, \pi)\) \(\pi \sim Beta(2, 2)\)
where Y is the number of successes in 10 independent trials. Each trial has probability \(\pi\) of succeeding. Imagine that we observe Y = 9 successes.
Although we can specify the posterior analytically, we can also approximate it with grid approximation.
using Distributions
using Plots
using Random
using StatsPlots
using StatsBase
Random.seed!(0408)
#vector of potential pi values
= 0:0.1:1
pi_grid
#prior beta distribution
= Beta(2, 2) d_beta
Beta{Float64}(α=2.0, β=2.0)
#evaluate the pdf at each value of pi
= pdf.(d_beta, pi_grid)
priors
#calculate the likelihood at each value of pi
= pdf.(Binomial.(10, pi_grid), 9)
likelihoodz
#compute the un-normalized posterior
= likelihoodz .* priors
un_normalized
#normalize the posterior
= un_normalized ./ sum(un_normalized) posteriorz
11-element Vector{Float64}:
0.0
6.99703248673802e-9
5.661203963590892e-6
0.00024994099745876834
0.0032608534830283545
0.02108962820317927
0.08357210821151959
0.2196098326613785
0.3710126629578925
0.30119930528454697
0.0
And then we can plot the resulting grid-approximated posterior
plot(pi_grid, posteriorz)
scatter!(pi_grid, posteriorz)
Of course, if our pi_grid contains more closely-spaced values, the approximated posterior becomes smoother:
= 0:0.01:1
pi2 = pdf.(d_beta, pi2)
prior2
#calculate the likelihood at each value of pi
= pdf.(Binomial.(10, pi2), 9)
likelihood2
#compute the un-normalized posterior
= likelihood2 .* prior2
un_normalized2
#normalize the posterior
= un_normalized2 ./ sum(un_normalized2)
posterior2
plot(pi2, posterior2)
scatter!(pi2, posterior2)
We can add the true resulting distribution – a Beta(11, 3) distribution – to this to show the approximation vs the true value:
= sample(pi2, Weights(posterior2), 100_000)
s
density(s, label="Approximate Posterior")
= Beta(11, 3)
true_beta = pdf.(true_beta, pi2)
ys plot!(pi2, ys, label="True Beta(11,3)")
One issue with grid approximation is that it quickly becomes intractable with many parameters – it is difficult to get a dense enough grid to get a good approximation as you start adding parameters.
So here’s MCMC to the rescue!
Like grid approximation models, MCMC samples are approximations.
Unlike grid approximation samples, though, MCMC samples are not independent. For \({\theta^1, \theta^2, ..., \theta^N}\), each \(\theta^{(i+1)}\) is conditional on \(\theta^i\). Or:
\(f(\theta^{(i+1)} | \theta^i, y)\)
The Metropolis-Hastings algo is a 2-step process for approximating a posterior distribution:
To unpack this a little bit:
Step 1: Propose a new location
We don’t know the posterior we’re drawing from, but we can use a “proposal model” to draw new locations from. We can use different distributions, but the Uniform distribution is the most straightforward. Assume we have a current location, \(x\), and bandwidth, \(w\), then our proposed new location, \(x'\), is:
\(x'|x \sim Unif(x - w, x + w)\)
Step 2: Decide whether or not to go there
We have to calculate the acceptance probability (the probability of accepting \(x'\)).
If the unnormalized posterior plausibility of the proposed location \(x'\) is greather than that of the current location \(x\) (i.e. \(f(x')L(x'|y) > f(x)L(x|y)\)), definitely go to \(x'\).
Otherwise, maybe go to \(x'\). This feature (maybe moving even if the proposed location is less likely than the current location) prevents the chain from getting stuck.
We can define the acceptance probability as:
\[\alpha = min\{1, \frac{f(x')L(x'|y)}{f(x)L(x|y)} \frac{q(x|x')}{q(x'|x)}\}\]
If we use a symmetric proposal model (like the Uniform distribution), then this simplifies to:
\[\alpha = min\{1, \frac{f(x')L(x'|y)}{f(x)L(x|y)}\}\]
This can further simplify to:
\[\alpha = min\{1, \frac{f(x'|y)}{f(x|y)}\}\]
Given this, in scenario 1, if \(f(x'|y) \ge f(x|y)\), then $alpha = 1 $, and we will definitely move to \(x'\)
Otherwise, if \(\alpha = \frac{f(x'|y)}{f(x|y)} \lt 1\), then we might move there with probability \(\alpha\)
Imagine we have a Normal-Normal model such that:
\(Y|\mu \sim N(\mu, 0.75^2)\)
\(\mu \sim N(0, 1^2)\)
i.e. Y is a numerical outcome that varies normally around \(\mu\) with standard deviation 0.75
Then imagine we observe an outcome Y = 6.25. Given this, and the above, we can calculate the posterior model as:
\(\mu | (Y = 6.25) \sim N(4, 0.6^2)\)
We can run through one iteration of the Metropolis-Hastings algorithm with the following, where w
is the bandwidth of our Uniform proposal model and current
is the current value on the chain:
function one_mh_iter(w, current)
= Float64(current)
c = Uniform(current - w, current + w)
u = rand(u, 1)[1]
proposal
#proposal_plaus = prior * likelihood
= pdf(Normal(0, 1), proposal) * pdf(Normal(proposal, 0.75), 6.25)
proposal_plaus
#current_plaus = prior * likelihood
= pdf(Normal(0, 1), current) * pdf(Normal(current, 0.75), 6.25)
current_plaus
= minimum([1, proposal_plaus / current_plaus])
α
= sample([proposal, c], Weights([α, 1 - α]), 1)[1]
next
return next
end
one_mh_iter (generic function with 1 method)
= one_mh_iter(1, 4) res
4.0
And then from there, we can use the logic in the one_mh_iter
function to take a “tour” and create an entire chain, which will be our simulated distribution.
function mh_tour(w, current, N::Int)
= Vector{Float64}(undef, N)
μ
for i ∈ eachindex(μ)
= one_mh_iter(w, current)
r = r
μ[i] = r
current end
return μ
end
mh_tour (generic function with 1 method)
= mh_tour(1, 4, 10_000) res
10000-element Vector{Float64}:
4.0
4.106609027850608
4.235886932047645
3.310894252707528
4.034179768273493
4.512960695766677
4.512960695766677
4.349575562580946
4.101991567803039
3.460601626178285
3.633816936072791
4.30401065365775
3.401923900064334
⋮
3.934350716176937
3.934350716176937
3.66719175153303
3.9773404547954994
3.7463512503491367
4.521828790801184
4.521828790801184
4.492908196169627
3.8668258667074396
3.8668258667074396
3.8668258667074396
3.8668258667074396
and we can look at the trace plot and the histogram to see the distribution we’ve simulated:
= plot(res, label="");
p1 = histogram(res);
p2
plot(p1, p2, layout=@layout([a b]))
And now let’s check out work – we should get approximately \(\mu = 4\) and \(\sigma = 0.6\)
mean(res), std(res)] [
2-element Vector{Float64}:
3.974150267286955
0.6134780589942376
Metropolis-Hastings is the fundamental MCMC algorithm, but it’s often not the one used in applications today. Other algorithms, such as the Gibbs algorithm or the Hamiltonian Monte Carlo (HMC) algorithm are more common, with HMC being the algorithm used by the {rstan}
package.