Bayesian Linear Regression

The following provides a simple working example of a standard regression model using Stan via rstan. It will hopefully to allow some to more easily jump in to using Stan if they are comfortable with R. You would normally just use rstanarm or brms for such a model however.

Data Setup

Create a correlation matrix of one’s choosing assuming response as last column/row. This approach allows for some collinearity in the predictors.

library(tidyverse)

cormat = matrix(
  c(
    1, .2, -.1, .3,
    .2, 1, .1, .2,
    -.1, .1, 1, .1,
    .3, .2, .1, 1
  ),
  ncol = 4, 
  byrow = TRUE
)

cormat
     [,1] [,2] [,3] [,4]
[1,]  1.0  0.2 -0.1  0.3
[2,]  0.2  1.0  0.1  0.2
[3,] -0.1  0.1  1.0  0.1
[4,]  0.3  0.2  0.1  1.0
cormat = Matrix::nearPD(cormat, corr = TRUE)$mat

n = 1000
means = rep(0, ncol(cormat))
d = MASS::mvrnorm(n, means, cormat, empirical = TRUE)
colnames(d) = c('X1', 'X2', 'X3', 'y')

d[,'y'] = d[,'y'] - .1 # unnecessary, just to model a non-zero intercept

str(d)
 num [1:1000, 1:4] -0.114 -0.409 4.198 1.098 0.331 ...
 - attr(*, "dimnames")=List of 2
  ..$ : NULL
  ..$ : chr [1:4] "X1" "X2" "X3" "y"
cor(d)
     X1  X2   X3   y
X1  1.0 0.2 -0.1 0.3
X2  0.2 1.0  0.1 0.2
X3 -0.1 0.1  1.0 0.1
y   0.3 0.2  0.1 1.0
# Prepare for Stan

# create X (add intercept column) and y for vectorized version later
X = cbind(1, d[,1:3]); colnames(X) = c('Intercept', 'X1', 'X2', 'X3')
y = d[,4]

Model Code

Initial preparation, create the data list object.

dat = list(
  N = n,
  k = 4,
  y = y,
  X = X
)

Create the Stan model code.

data {                      // Data block; declarations only
  int<lower = 0> N;         // Sample size                         
  int<lower = 0> k;         // Dimension of model matrix
  matrix [N, k] X;          // Model Matrix
  vector[N] y;              // Target
}

/* transformed data {       // Transformed data block; declarations and statements. None needed here.
 }
*/

parameters {                // Parameters block; declarations only
  vector[k] beta;           // Coefficient vector
  real<lower = 0> sigma;    // Error scale
}

transformed parameters {    // Transformed parameters block; declarations and statements.

}

model {                     // Model block; declarations and statements.
  vector[N] mu;
  mu = X * beta;            // Linear predictor

  // priors
  beta  ~ normal(0, 1);
  sigma ~ cauchy(0, 1);     // With sigma bounded at 0, this is half-cauchy 

  // likelihood
  y ~ normal(mu, sigma);
}

generated quantities {      // Generated quantities block; declarations and statements.
  real rss;                
  real totalss;
  real R2;                  // Calculate Rsq as a demonstration
  vector[N] y_hat;
  
  y_hat = X * beta;
  rss = dot_self(y - y_hat);
  totalss = dot_self(y - mean(y));
  R2 = 1 - rss/totalss;
}

Estimation

Run the model and examine results. The following assumes a character string or file (bayes_linreg) of the previous model code.

library(rstan)

fit = sampling(
  bayes_linreg,
  data  = dat,
  thin  = 4,
  verbose = FALSE
)

Note the pars argument in the following. You must specify desired parameters or it will print out everything, including the y_hat, i.e. expected values. Also note that by taking into account the additional uncertainty estimating sigma, you get a shrunken Rsq (see Gelman & Pardoe 2006 sec. 3).

print(
  fit,
  digits_summary = 3,
  pars  = c('beta', 'sigma', 'R2'),
  probs = c(.025, .5, .975)
)
Inference for Stan model: 17507cf73e3a44aeee4c4249d3521a85.
4 chains, each with iter=2000; warmup=1000; thin=4; 
post-warmup draws per chain=250, total post-warmup draws=1000.

          mean se_mean    sd   2.5%    50%  97.5% n_eff  Rhat
beta[1] -0.100   0.001 0.030 -0.158 -0.099 -0.041   920 0.997
beta[2]  0.284   0.001 0.030  0.225  0.284  0.340  1067 1.003
beta[3]  0.133   0.001 0.032  0.072  0.132  0.196  1010 1.001
beta[4]  0.116   0.001 0.029  0.061  0.115  0.175   932 1.001
sigma    0.940   0.001 0.021  0.899  0.940  0.981  1147 1.000
R2       0.120   0.000 0.003  0.114  0.120  0.123   918 0.998

Samples were drawn using NUTS(diag_e) at Wed Nov 25 17:33:03 2020.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Comparison

Compare to basic lm result.

modlm = lm(y ~ ., data.frame(d))
# Compare
summary(modlm)

Call:
lm(formula = y ~ ., data = data.frame(d))

Residuals:
     Min       1Q   Median       3Q      Max 
-2.70296 -0.68588  0.01811  0.66373  3.10820 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.10000    0.02965  -3.372 0.000774 ***
X1           0.28526    0.03051   9.349  < 2e-16 ***
X2           0.13141    0.03051   4.307 1.82e-05 ***
X3           0.11538    0.03004   3.840 0.000131 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9377 on 996 degrees of freedom
Multiple R-squared:  0.1234,    Adjusted R-squared:  0.1208 
F-statistic: 46.73 on 3 and 996 DF,  p-value: < 2.2e-16

Visualize

Visualize the posterior predictive distribution.

# shinystan::launch_shinystan(fit)   # diagnostic plots
library(bayesplot)

pp_check(
  dat$y, 
  rstan::extract(fit, par = 'y_hat')$y_hat[1:10, ], 
  fun = 'dens_overlay'
)