In this chapter, I’m going to walk through fitting a bayesian logistic regression. This is analogous to the previous chapter where I fit a linear regression, but, uh, it’s a logistic regression.

First, I’ll load some libraries and read in some data.

usingDistributionsusingPlotsusingTuringusingRandomusingDataFramesusingLazyArraysusingCSVRandom.seed!(0408)#read in dataurl ="https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/wells.csv"wells = CSV.read(download(url), DataFrame)describe(wells)

5×7 DataFrame

Row

variable

mean

min

median

max

nmissing

eltype

Symbol

Float64

Real

Float64

Real

Int64

DataType

1

switch

0.575166

0

1.0

1

0

Int64

2

arsenic

1.65693

0.51

1.3

9.65

0

Float64

3

dist

48.3319

0.387

36.7615

339.531

0

Float64

4

assoc

0.422848

0

0.0

1

0

Int64

5

educ

4.82848

0

5.0

17

0

Int64

The outcome variable in this dataset is switch – a binary indicator of whether a household switched wells after being informed that the wells they’d been using previously were contaminated.

So a first step then is to separate the outcome from the rest of the predictors.

Note – it’s best practice to explore the data and do some feature engineering (e.g. z-scoring numeric predictors). But I’m not going to do that here because I don’t really care about the quality of the model in this contrived example.

X =Matrix(select(wells, Not(:switch)))y = wells.:switch

In words – that the probability of switching wells is equal to the logistic transformation of a linear combination of \(\alpha + \beta^T \bf X_i\), which is just the linear regression model.

And recall that the logistic function is:

\(Logistic(x) = \frac{1}{1 + e^{(-x)}}\)

6.1 Define the model

Just as we did in the linear regression model, we define a logistic regression using Turin’s @model macro. Once again, this uses pretty weak priors, and each variable gets the same prior here

@modelfunctionlog_reg(X, y) n_feat =size(X, 2)#priors α ~Normal(0, 3) β ~filldist(TDist(3), n_feat)#likelihoodreturn y ~arraydist(LazyArray(@~BernoulliLogit.(α .+ X * β)))end

log_reg (generic function with 2 methods)

In notation:

\(y \sim Bernoulli(p)\)

\(p \sim Logistic(\alpha + \bf X \cdot \beta)\)

\(\alpha \sim N(0, 3)\)

\(\beta_j \sim t(0, 1, 3)\) for j in \(\{\beta_1, ..., \beta_j\}\)

Also – the model specification above comes from Jose Storopoli’s code, and there are a few little wrinkles I didn’t quite understand at first:

filldist()is basically a loop to make the same distribution multiple times. So for example, it will create n_feat identical distributions (T distributions with 3 degrees of freedom);

arraydist() is similar to filldist(), but it is a wrapper for an array of distributions that aren’t necessarily the same.

As far as I can tell, LazyArray() and @~ provide ways to specify lazy computations, which make the operations faster and more efficient.

BernoulliLogit() is a cool way to combine the Logistic() and Bernoulli() functions.

6.2 Simulate the Posterior

Then we can simulate the posterior just like we did before:

m =log_reg(X, y)chn =sample(m, NUTS(), MCMCThreads(), 1_000, 3);summarize(chn)

┌ Warning: Only a single thread available: MCMC chains are not sampled in parallel
└ @ AbstractMCMC C:\Users\eric_ekholm\.julia\packages\AbstractMCMC\Es490\src\sample.jl:307
┌ Info: Found initial step size
└ ϵ = 0.003125
┌ Info: Found initial step size
└ ϵ = 0.003125
┌ Info: Found initial step size
└ ϵ = 0.0125
Sampling (1 threads): 67%|████████████████████ | ETA: 0:00:02Sampling (1 threads): 100%|█████████████████████████████| Time: 0:00:09

Ok, so just to prove that the arraydist(...) term is doing what I think it’s doing in the model above, I’m going to define a second model that’s hopefully equivalent to the first!

The process for extracting information from the chains, predicting Y, etc. is going to be the same as it was for linear regression (since it uses the same API), so I’m not going to go through all of that here and instead just refer myself back to the linear_regression page.