Bayesian Linear Regression
The following provides a simple working example of a standard regression model using Stan via rstan. It will hopefully to allow some to more easily jump in to using Stan if they are comfortable with R. You would normally just use rstanarm or brms for such a model however.
Data Setup
Create a correlation matrix of one’s choosing assuming response as last column/row. This approach allows for some collinearity in the predictors.
library(tidyverse)
= matrix(
cormat c(
1, .2, -.1, .3,
2, 1, .1, .2,
.-.1, .1, 1, .1,
3, .2, .1, 1
.
),ncol = 4,
byrow = TRUE
)
cormat
[,1] [,2] [,3] [,4]
[1,] 1.0 0.2 -0.1 0.3
[2,] 0.2 1.0 0.1 0.2
[3,] -0.1 0.1 1.0 0.1
[4,] 0.3 0.2 0.1 1.0
= Matrix::nearPD(cormat, corr = TRUE)$mat
cormat
= 1000
n = rep(0, ncol(cormat))
means = MASS::mvrnorm(n, means, cormat, empirical = TRUE)
d colnames(d) = c('X1', 'X2', 'X3', 'y')
'y'] = d[,'y'] - .1 # unnecessary, just to model a non-zero intercept
d[,
str(d)
num [1:1000, 1:4] -0.114 -0.409 4.198 1.098 0.331 ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:4] "X1" "X2" "X3" "y"
cor(d)
X1 X2 X3 y
X1 1.0 0.2 -0.1 0.3
X2 0.2 1.0 0.1 0.2
X3 -0.1 0.1 1.0 0.1
y 0.3 0.2 0.1 1.0
# Prepare for Stan
# create X (add intercept column) and y for vectorized version later
= cbind(1, d[,1:3]); colnames(X) = c('Intercept', 'X1', 'X2', 'X3')
X = d[,4] y
Model Code
Initial preparation, create the data list object.
= list(
dat N = n,
k = 4,
y = y,
X = X
)
Create the Stan model code.
data { // Data block; declarations only
int<lower = 0> N; // Sample size
int<lower = 0> k; // Dimension of model matrix
matrix [N, k] X; // Model Matrix
vector[N] y; // Target
}
/* transformed data { // Transformed data block; declarations and statements. None needed here.
}
*/
parameters { // Parameters block; declarations only
vector[k] beta; // Coefficient vector
real<lower = 0> sigma; // Error scale
}
transformed parameters { // Transformed parameters block; declarations and statements.
}
model { // Model block; declarations and statements.
vector[N] mu;
mu = X * beta; // Linear predictor
// priors
beta ~ normal(0, 1);
sigma ~ cauchy(0, 1); // With sigma bounded at 0, this is half-cauchy
// likelihood
y ~ normal(mu, sigma);
}
generated quantities { // Generated quantities block; declarations and statements.
real rss;
real totalss;
real R2; // Calculate Rsq as a demonstration
vector[N] y_hat;
y_hat = X * beta;
rss = dot_self(y - y_hat);
totalss = dot_self(y - mean(y));
R2 = 1 - rss/totalss;
}
Estimation
Run the model and examine results. The following assumes a character string or file (bayes_linreg
) of the previous model code.
library(rstan)
= sampling(
fit
bayes_linreg,data = dat,
thin = 4,
verbose = FALSE
)
Note the pars
argument in the following. You must specify desired parameters or it will print out everything, including the y_hat
, i.e. expected values. Also note that by taking into account the additional uncertainty estimating sigma, you get a shrunken Rsq (see Gelman & Pardoe 2006 sec. 3).
print(
fit,digits_summary = 3,
pars = c('beta', 'sigma', 'R2'),
probs = c(.025, .5, .975)
)
Inference for Stan model: 17507cf73e3a44aeee4c4249d3521a85.
4 chains, each with iter=2000; warmup=1000; thin=4;
post-warmup draws per chain=250, total post-warmup draws=1000.
mean se_mean sd 2.5% 50% 97.5% n_eff Rhat
beta[1] -0.100 0.001 0.030 -0.158 -0.099 -0.041 920 0.997
beta[2] 0.284 0.001 0.030 0.225 0.284 0.340 1067 1.003
beta[3] 0.133 0.001 0.032 0.072 0.132 0.196 1010 1.001
beta[4] 0.116 0.001 0.029 0.061 0.115 0.175 932 1.001
sigma 0.940 0.001 0.021 0.899 0.940 0.981 1147 1.000
R2 0.120 0.000 0.003 0.114 0.120 0.123 918 0.998
Samples were drawn using NUTS(diag_e) at Wed Nov 25 17:33:03 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
Comparison
Compare to basic lm result.
= lm(y ~ ., data.frame(d))
modlm # Compare
summary(modlm)
Call:
lm(formula = y ~ ., data = data.frame(d))
Residuals:
Min 1Q Median 3Q Max
-2.70296 -0.68588 0.01811 0.66373 3.10820
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.10000 0.02965 -3.372 0.000774 ***
X1 0.28526 0.03051 9.349 < 2e-16 ***
X2 0.13141 0.03051 4.307 1.82e-05 ***
X3 0.11538 0.03004 3.840 0.000131 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9377 on 996 degrees of freedom
Multiple R-squared: 0.1234, Adjusted R-squared: 0.1208
F-statistic: 46.73 on 3 and 996 DF, p-value: < 2.2e-16
Visualize
Visualize the posterior predictive distribution.
# shinystan::launch_shinystan(fit) # diagnostic plots
library(bayesplot)
pp_check(
$y,
dat::extract(fit, par = 'y_hat')$y_hat[1:10, ],
rstanfun = 'dens_overlay'
)
Source
Original code available at: https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/Bayesian/rstan_linregwithprior.R