Model Enhancements

Enhancing and making adjustments to a model can often be straightforward in the Bayesian context, depending on what one wants to accomplish. In other cases, some things may be possible that aren’t readily available with standard approaches in the traditional setting. The following shows a few brief examples to give an idea of the possibilities.

Generating New Variables of Interest

We have already seen one way to get at new statistics of interest in the predictive model checking section. I next show how to do so as part of the modeling process itself. In Stan, we can accomplish this via the generated quantities section.

A typical part of linear regression output is \(R^2\), the amount of variance accounted for by the model. To get this in Stan we just have to create the code necessary for the calculations, and place it within the generated quantities section. I only show this part of the model code; everything we had before would remain the same. I show the corresponding lm results from R for comparison. There are a couple of ways to go about this, and I use some of Stan’s matrix operations as one approach.


generated quantities {
  real rss;                
  real totalss;
  real<lower=0, upper=1> R2;                 
  vector[N] mu;
  mu <- X * beta;
  rss <- dot_self(y-mu);
  totalss <- dot_self(y-mean(y));
  R2 <- 1 - rss/totalss;

Using the results from the model using lm, we do the same calculations for rss and totalss, and note the result is identical to what you’d see in the summary of the model.

rss = crossprod(resid(modlm))
totalss = crossprod(y - mean(y))
rss = rss[1]
totalss = totalss[1] # for alignment, remove matrix
1 - rss / totalss
[1] 0.4524289
[1] 0.4524289

Now we can run the model with added \(R^2\) 40. Note that as before we do not just get a point estimate, but a whole distribution of simulated values for \(R^2\). First the results.

  digits = 3,
  par    = c('beta', 'sigma', 'R2'),
  prob   = c(.025, .5, .975)
Inference for Stan model: stanmodelcodeRsq.
3 chains, each with iter=12000; warmup=2000; thin=10; 
post-warmup draws per chain=1000, total post-warmup draws=3000.

          mean se_mean    sd   2.5%    50%  97.5% n_eff  Rhat
beta[1]  4.895   0.002 0.129  4.639  4.897  5.144  2884 1.000
beta[2]  0.087   0.002 0.131 -0.169  0.086  0.342  2776 1.000
beta[3] -1.466   0.002 0.125 -1.712 -1.469 -1.219  2826 0.999
beta[4]  0.821   0.002 0.123  0.584  0.820  1.063  2987 0.999
sigma    2.028   0.002 0.091  1.858  2.025  2.212  2945 1.000
R2       0.443   0.000 0.006  0.427  0.445  0.451  2932 1.000

Samples were drawn using NUTS(diag_e) at Sat May 24 13:10:08 2014.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

The nice thing here is that our \(R^2\) incorporates the additional uncertainty in estimating the model parameters, and thus acts like an adjusted \(R^2\) 41. The following is the classical regression adjusted \(R^2\).

[1] 0.4457512

Furthermore, in the Bayesian context we get an interval estimate and everything else we typically get as with other quantities of interest, and the same would be true for anything else we want to calculate (e.g. predictions). In addition, it would be trivial to calculate something like the actual adjusted \(R^2\), the probability that the \(R^2\) value is greater than .5, and other things of that nature.

Robust Regression

If we were concerned that extreme observations exist that our current model is not able to capture well, we could change the sampling distribution to one that had a little more probability in the tails. This is very easy to do in this situation, as we just change likelihood portion of our code to employ say, a t-distribution42. In Stan, the t-distribution has parameters mean and sigma as with the normal distribution, but we also have the added parameter for degrees of freedom. So our code might look like the following:

stanmodelcodeT = "

model {                     
  vector[N] mu;
  mu <- X * beta;           
  // priors
  beta  ~ normal(0, 10);
  sigma ~ cauchy(0, 5);     
  // likelihood
  // y ~ normal(mu, sigma);            // previously used normal 
  y ~ student_t(10, mu, sigma)         // t with df=10

In this case we set the degrees of freedom at 1043, but how would you know in advance what to set it as? It might be better to place a prior (with lower bound of one) for that value, and then estimate it as part of the modeling process. One should note that there are many distributions available in Stan (e.g. others might be useful for skewed data, truncated etc.), and more will be added in the future.

Generalized Linear Model

Expanding from standard linear model, we can move very easily to generalized linear models, of which the standard regression is a special case. The key components are use of a link function that links the linear predictor to the target variable, and an appropriate sampling distribution for the likelihood.

Let’s consider a count model using the Poisson distribution. We can specify the model as follows:

\[y \sim Pois(\lambda)\]

\[g(\lambda) = X\beta\]

where \(g(.)\) is the link function, the canonical link function for Poisson being the natural logarithm. In Stan, this can be expressed via the inverse link function, where we exponentiate the linear predictor44. Aside from that we simply specify \(y\) as distributed Poisson in the same way we used the normal and t-distribution in earlier efforts.

stanmodelcodePoisson = "

model {                     
  vector[N] lambda;
  vector[N] eta;

  eta <- X * beta;
  lambda <- exp(eta)
  // priors
  beta ~ normal(0, 10);

  // likelihood
  y ~ poisson(lambda)

And that’s all there is to that45. We just saw that we are not limited to the exponential family distributions of GLM(s), though that covers a lot of ground, and so at this point you have a lot of the tools covered in standard applied statistics course, and a few beyond.

  1. With the rstanarm package, R2 is automatically calculated and provided with the stan_lm function. It is also available in rstanarm and brms via the bayes_R2 function.↩︎

  2. See Gelman & Pardoe (2006), Bayesian Measures of Explained Variance.↩︎

  3. Note that with the brms package all one would have to do is change the family argument in the model function.↩︎

  4. Alternatively, we could add a value ‘df’ to the data list and data block.↩︎

  5. You can also forgo the exponentiation and instead use the poisson_log function in your sampling statement (slightly faster too).↩︎

  6. Note that in various modeling scenarios, you might have to loop over the values of \(y\),
    for(n in 1:N) ... to incorporate additional complexity. In general, a vectorized approach as we have done is preferable if it’s possible.↩︎