# Bayesian Linear Regression

The following provides a simple working example of a standard regression model using Stan via rstan. It will hopefully to allow some to more easily jump in to using Stan if they are comfortable with R. You would normally just use rstanarm or brms for such a model however.

## Data Setup

Create a correlation matrix of one’s choosing assuming response as last column/row. This approach allows for some collinearity in the predictors.

library(tidyverse)

cormat = matrix(
c(
1, .2, -.1, .3,
.2, 1, .1, .2,
-.1, .1, 1, .1,
.3, .2, .1, 1
),
ncol = 4,
byrow = TRUE
)

cormat
[,1] [,2] [,3] [,4]
[1,]  1.0  0.2 -0.1  0.3
[2,]  0.2  1.0  0.1  0.2
[3,] -0.1  0.1  1.0  0.1
[4,]  0.3  0.2  0.1  1.0
cormat = Matrix::nearPD(cormat, corr = TRUE)$mat n = 1000 means = rep(0, ncol(cormat)) d = MASS::mvrnorm(n, means, cormat, empirical = TRUE) colnames(d) = c('X1', 'X2', 'X3', 'y') d[,'y'] = d[,'y'] - .1 # unnecessary, just to model a non-zero intercept str(d) num [1:1000, 1:4] -0.114 -0.409 4.198 1.098 0.331 ... - attr(*, "dimnames")=List of 2 ..$ : NULL
..$: chr [1:4] "X1" "X2" "X3" "y" cor(d) X1 X2 X3 y X1 1.0 0.2 -0.1 0.3 X2 0.2 1.0 0.1 0.2 X3 -0.1 0.1 1.0 0.1 y 0.3 0.2 0.1 1.0 # Prepare for Stan # create X (add intercept column) and y for vectorized version later X = cbind(1, d[,1:3]); colnames(X) = c('Intercept', 'X1', 'X2', 'X3') y = d[,4] ## Model Code Initial preparation, create the data list object. dat = list( N = n, k = 4, y = y, X = X ) Create the Stan model code. data { // Data block; declarations only int<lower = 0> N; // Sample size int<lower = 0> k; // Dimension of model matrix matrix [N, k] X; // Model Matrix vector[N] y; // Target } /* transformed data { // Transformed data block; declarations and statements. None needed here. } */ parameters { // Parameters block; declarations only vector[k] beta; // Coefficient vector real<lower = 0> sigma; // Error scale } transformed parameters { // Transformed parameters block; declarations and statements. } model { // Model block; declarations and statements. vector[N] mu; mu = X * beta; // Linear predictor // priors beta ~ normal(0, 1); sigma ~ cauchy(0, 1); // With sigma bounded at 0, this is half-cauchy // likelihood y ~ normal(mu, sigma); } generated quantities { // Generated quantities block; declarations and statements. real rss; real totalss; real R2; // Calculate Rsq as a demonstration vector[N] y_hat; y_hat = X * beta; rss = dot_self(y - y_hat); totalss = dot_self(y - mean(y)); R2 = 1 - rss/totalss; } ## Estimation Run the model and examine results. The following assumes a character string or file (bayes_linreg) of the previous model code. library(rstan) fit = sampling( bayes_linreg, data = dat, thin = 4, verbose = FALSE ) Note the pars argument in the following. You must specify desired parameters or it will print out everything, including the y_hat, i.e. expected values. Also note that by taking into account the additional uncertainty estimating sigma, you get a shrunken Rsq (see Gelman & Pardoe 2006 sec. 3). print( fit, digits_summary = 3, pars = c('beta', 'sigma', 'R2'), probs = c(.025, .5, .975) ) Inference for Stan model: 17507cf73e3a44aeee4c4249d3521a85. 4 chains, each with iter=2000; warmup=1000; thin=4; post-warmup draws per chain=250, total post-warmup draws=1000. mean se_mean sd 2.5% 50% 97.5% n_eff Rhat beta[1] -0.100 0.001 0.030 -0.158 -0.099 -0.041 920 0.997 beta[2] 0.284 0.001 0.030 0.225 0.284 0.340 1067 1.003 beta[3] 0.133 0.001 0.032 0.072 0.132 0.196 1010 1.001 beta[4] 0.116 0.001 0.029 0.061 0.115 0.175 932 1.001 sigma 0.940 0.001 0.021 0.899 0.940 0.981 1147 1.000 R2 0.120 0.000 0.003 0.114 0.120 0.123 918 0.998 Samples were drawn using NUTS(diag_e) at Wed Nov 25 17:33:03 2020. For each parameter, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat=1). ## Comparison Compare to basic lm result. modlm = lm(y ~ ., data.frame(d)) # Compare summary(modlm) Call: lm(formula = y ~ ., data = data.frame(d)) Residuals: Min 1Q Median 3Q Max -2.70296 -0.68588 0.01811 0.66373 3.10820 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -0.10000 0.02965 -3.372 0.000774 *** X1 0.28526 0.03051 9.349 < 2e-16 *** X2 0.13141 0.03051 4.307 1.82e-05 *** X3 0.11538 0.03004 3.840 0.000131 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.9377 on 996 degrees of freedom Multiple R-squared: 0.1234, Adjusted R-squared: 0.1208 F-statistic: 46.73 on 3 and 996 DF, p-value: < 2.2e-16 ## Visualize Visualize the posterior predictive distribution. # shinystan::launch_shinystan(fit) # diagnostic plots library(bayesplot) pp_check( dat$y,
rstan::extract(fit, par = 'y_hat')\$y_hat[1:10, ],
fun = 'dens_overlay'
)