In this chapter, I’m going to walk through fitting a bayesian logistic regression. This is analogous to the previous chapter where I fit a linear regression, but, uh, it’s a logistic regression.
First, I’ll load some libraries and read in some data.
usingDistributionsusingPlotsusingTuringusingRandomusingDataFramesusingLazyArraysusingCSVRandom.seed!(0408)#read in dataurl ="https://raw.githubusercontent.com/storopoli/Bayesian-Julia/master/datasets/wells.csv"wells = CSV.read(download(url), DataFrame)describe(wells)
5×7 DataFrame
Row
variable
mean
min
median
max
nmissing
eltype
Symbol
Float64
Real
Float64
Real
Int64
DataType
1
switch
0.575166
0
1.0
1
0
Int64
2
arsenic
1.65693
0.51
1.3
9.65
0
Float64
3
dist
48.3319
0.387
36.7615
339.531
0
Float64
4
assoc
0.422848
0
0.0
1
0
Int64
5
educ
4.82848
0
5.0
17
0
Int64
The outcome variable in this dataset is switch – a binary indicator of whether a household switched wells after being informed that the wells they’d been using previously were contaminated.
So a first step then is to separate the outcome from the rest of the predictors.
Note – it’s best practice to explore the data and do some feature engineering (e.g. z-scoring numeric predictors). But I’m not going to do that here because I don’t really care about the quality of the model in this contrived example.
X =Matrix(select(wells, Not(:switch)))y = wells.:switch
In words – that the probability of switching wells is equal to the logistic transformation of a linear combination of \(\alpha + \beta^T \bf X_i\), which is just the linear regression model.
And recall that the logistic function is:
\(Logistic(x) = \frac{1}{1 + e^{(-x)}}\)
6.1 Define the model
Just as we did in the linear regression model, we define a logistic regression using Turin’s @model macro. Once again, this uses pretty weak priors, and each variable gets the same prior here
@modelfunctionlog_reg(X, y) n_feat =size(X, 2)#priors α ~Normal(0, 3) β ~filldist(TDist(3), n_feat)#likelihoodreturn y ~arraydist(LazyArray(@~BernoulliLogit.(α .+ X * β)))end
log_reg (generic function with 2 methods)
In notation:
\(y \sim Bernoulli(p)\)
\(p \sim Logistic(\alpha + \bf X \cdot \beta)\)
\(\alpha \sim N(0, 3)\)
\(\beta_j \sim t(0, 1, 3)\) for j in \(\{\beta_1, ..., \beta_j\}\)
Also – the model specification above comes from Jose Storopoli’s code, and there are a few little wrinkles I didn’t quite understand at first:
filldist()is basically a loop to make the same distribution multiple times. So for example, it will create n_feat identical distributions (T distributions with 3 degrees of freedom);
arraydist() is similar to filldist(), but it is a wrapper for an array of distributions that aren’t necessarily the same.
As far as I can tell, LazyArray() and @~ provide ways to specify lazy computations, which make the operations faster and more efficient.
BernoulliLogit() is a cool way to combine the Logistic() and Bernoulli() functions.
6.2 Simulate the Posterior
Then we can simulate the posterior just like we did before:
m =log_reg(X, y)chn =sample(m, NUTS(), MCMCThreads(), 1_000, 3);summarize(chn)
┌ Warning: Only a single thread available: MCMC chains are not sampled in parallel
└ @ AbstractMCMC C:\Users\eric_ekholm\.julia\packages\AbstractMCMC\Es490\src\sample.jl:307
┌ Info: Found initial step size
└ ϵ = 0.003125
┌ Info: Found initial step size
└ ϵ = 0.003125
┌ Info: Found initial step size
└ ϵ = 0.0125
Sampling (1 threads): 67%|████████████████████ | ETA: 0:00:02Sampling (1 threads): 100%|█████████████████████████████| Time: 0:00:09
Ok, so just to prove that the arraydist(...) term is doing what I think it’s doing in the model above, I’m going to define a second model that’s hopefully equivalent to the first!
The process for extracting information from the chains, predicting Y, etc. is going to be the same as it was for linear regression (since it uses the same API), so I’m not going to go through all of that here and instead just refer myself back to the linear_regression page.