```
using Distributions
using Plots
using Random
using StatsPlots
using StatsBase
Random.seed!(0408)
#vector of potential pi values
= 0:0.1:1
pi_grid
#prior beta distribution
= Beta(2, 2) d_beta
```

`Beta{Float64}(α=2.0, β=2.0)`

Often, particularly when we have lots of parameters to estimate, it’s computationally prohibitive (if not actually impossible) to try to specify the posterior distribution. In these cases, we need to approximate the posterior via simulation.

There are many different approaches for, including grid approximation, quadtratic approximation, and various flavors of Markov Chain Monte Carlo (MCMC) algorithms (e.g. Metropolis, Gibbs, Hamiltonian).

Imagine we have the following beta-binomial model:

\(Y|\pi \sim Bin(10, \pi)\) \(\pi \sim Beta(2, 2)\)

where Y is the number of successes in 10 independent trials. Each trial has probability \(\pi\) of succeeding. Imagine that we observe Y = 9 successes.

Although we can specify the posterior analytically, we can also approximate it with grid approximation.

```
using Distributions
using Plots
using Random
using StatsPlots
using StatsBase
Random.seed!(0408)
#vector of potential pi values
= 0:0.1:1
pi_grid
#prior beta distribution
= Beta(2, 2) d_beta
```

`Beta{Float64}(α=2.0, β=2.0)`

```
#evaluate the pdf at each value of pi
= pdf.(d_beta, pi_grid)
priors
#calculate the likelihood at each value of pi
= pdf.(Binomial.(10, pi_grid), 9)
likelihoodz
#compute the un-normalized posterior
= likelihoodz .* priors
un_normalized
#normalize the posterior
= un_normalized ./ sum(un_normalized) posteriorz
```

```
11-element Vector{Float64}:
0.0
6.99703248673802e-9
5.661203963590892e-6
0.00024994099745876834
0.0032608534830283545
0.02108962820317927
0.08357210821151959
0.2196098326613785
0.3710126629578925
0.30119930528454697
0.0
```

And then we can plot the resulting grid-approximated posterior

```
plot(pi_grid, posteriorz)
scatter!(pi_grid, posteriorz)
```

Of course, if our pi_grid contains more closely-spaced values, the approximated posterior becomes smoother:

```
= 0:0.01:1
pi2 = pdf.(d_beta, pi2)
prior2
#calculate the likelihood at each value of pi
= pdf.(Binomial.(10, pi2), 9)
likelihood2
#compute the un-normalized posterior
= likelihood2 .* prior2
un_normalized2
#normalize the posterior
= un_normalized2 ./ sum(un_normalized2)
posterior2
plot(pi2, posterior2)
scatter!(pi2, posterior2)
```

We can add the true resulting distribution – a Beta(11, 3) distribution – to this to show the approximation vs the true value:

```
= sample(pi2, Weights(posterior2), 100_000)
s
density(s, label="Approximate Posterior")
= Beta(11, 3)
true_beta = pdf.(true_beta, pi2)
ys plot!(pi2, ys, label="True Beta(11,3)")
```

One issue with grid approximation is that it quickly becomes intractable with many parameters – it is difficult to get a dense enough grid to get a good approximation as you start adding parameters.

So here’s MCMC to the rescue!

Like grid approximation models, MCMC samples are approximations.

Unlike grid approximation samples, though, MCMC samples are not independent. For \({\theta^1, \theta^2, ..., \theta^N}\), each \(\theta^{(i+1)}\) is conditional on \(\theta^i\). Or:

\(f(\theta^{(i+1)} | \theta^i, y)\)

The Metropolis-Hastings algo is a 2-step process for approximating a posterior distribution:

- Propose a new “location” in the theoretical posterior;
- Decide whether or not to go there.

To unpack this a little bit:

**Step 1: Propose a new location**

We don’t know the posterior we’re drawing from, but we can use a “proposal model” to draw new locations from. We can use different distributions, but the Uniform distribution is the most straightforward. Assume we have a current location, \(x\), and bandwidth, \(w\), then our proposed new location, \(x'\), is:

\(x'|x \sim Unif(x - w, x + w)\)

**Step 2: Decide whether or not to go there**

We have to calculate the acceptance probability (the probability of accepting \(x'\)).

If the unnormalized posterior plausibility of the proposed location \(x'\) is greather than that of the current location \(x\) (i.e. \(f(x')L(x'|y) > f(x)L(x|y)\)), *definitely* go to \(x'\).

Otherwise, *maybe* go to \(x'\). This feature (maybe moving even if the proposed location is less likely than the current location) prevents the chain from getting stuck.

We can define the acceptance probability as:

\[\alpha = min\{1, \frac{f(x')L(x'|y)}{f(x)L(x|y)} \frac{q(x|x')}{q(x'|x)}\}\]

If we use a symmetric proposal model (like the Uniform distribution), then this simplifies to:

\[\alpha = min\{1, \frac{f(x')L(x'|y)}{f(x)L(x|y)}\}\]

This can further simplify to:

\[\alpha = min\{1, \frac{f(x'|y)}{f(x|y)}\}\]

Given this, in scenario 1, if \(f(x'|y) \ge f(x|y)\), then $alpha = 1 $, and we will definitely move to \(x'\)

Otherwise, if \(\alpha = \frac{f(x'|y)}{f(x|y)} \lt 1\), then we might move there with probability \(\alpha\)

Imagine we have a Normal-Normal model such that:

\(Y|\mu \sim N(\mu, 0.75^2)\)

\(\mu \sim N(0, 1^2)\)

i.e. Y is a numerical outcome that varies normally around \(\mu\) with standard deviation 0.75

Then imagine we observe an outcome Y = 6.25. Given this, and the above, we can calculate the posterior model as:

\(\mu | (Y = 6.25) \sim N(4, 0.6^2)\)

We can run through one iteration of the Metropolis-Hastings algorithm with the following, where `w`

is the bandwidth of our Uniform proposal model and `current`

is the current value on the chain:

```
function one_mh_iter(w, current)
= Float64(current)
c = Uniform(current - w, current + w)
u = rand(u, 1)[1]
proposal
#proposal_plaus = prior * likelihood
= pdf(Normal(0, 1), proposal) * pdf(Normal(proposal, 0.75), 6.25)
proposal_plaus
#current_plaus = prior * likelihood
= pdf(Normal(0, 1), current) * pdf(Normal(current, 0.75), 6.25)
current_plaus
= minimum([1, proposal_plaus / current_plaus])
α
= sample([proposal, c], Weights([α, 1 - α]), 1)[1]
next
return next
end
```

`one_mh_iter (generic function with 1 method)`

`= one_mh_iter(1, 4) res `

`4.0`

And then from there, we can use the logic in the `one_mh_iter`

function to take a “tour” and create an entire chain, which will be our simulated distribution.

```
function mh_tour(w, current, N::Int)
= Vector{Float64}(undef, N)
μ
for i ∈ eachindex(μ)
= one_mh_iter(w, current)
r = r
μ[i] = r
current end
return μ
end
```

`mh_tour (generic function with 1 method)`

`= mh_tour(1, 4, 10_000) res `

```
10000-element Vector{Float64}:
4.0
4.106609027850608
4.235886932047645
3.310894252707528
4.034179768273493
4.512960695766677
4.512960695766677
4.349575562580946
4.101991567803039
3.460601626178285
3.633816936072791
4.30401065365775
3.401923900064334
⋮
3.934350716176937
3.934350716176937
3.66719175153303
3.9773404547954994
3.7463512503491367
4.521828790801184
4.521828790801184
4.492908196169627
3.8668258667074396
3.8668258667074396
3.8668258667074396
3.8668258667074396
```

and we can look at the trace plot and the histogram to see the distribution we’ve simulated:

```
= plot(res, label="");
p1 = histogram(res);
p2
plot(p1, p2, layout=@layout([a b]))
```

And now let’s check out work – we should get approximately \(\mu = 4\) and \(\sigma = 0.6\)

`mean(res), std(res)] [`

```
2-element Vector{Float64}:
3.974150267286955
0.6134780589942376
```

Metropolis-Hastings is the fundamental MCMC algorithm, but it’s often not the one used in applications today. Other algorithms, such as the Gibbs algorithm or the Hamiltonian Monte Carlo (HMC) algorithm are more common, with HMC being the algorithm used by the `{rstan}`

package.