R Packages

Here I will go into a bit of detail regarding rstanarm and brms. For standard models, these should be your first choice, rather than using Stan directly. Why? For one, the underlying code that is used will be more optimized and efficient than what you come up with, and also, that code has had multiple individuals developing it and hundreds actually using it. Furthermore, you can still, and probably should, set your priors as you wish.

The nice thing about both is that you use the same syntax that you do for R modeling in general. Here is a a basic GLM in both.

stan_glm(y ~ x + z, data = d)

brm(y ~ x + z, data = d)

And here are a couple complexities thrown in to show some minor differences. For example, the priors are specified a bit differently, and you may have options for one that you won’t have in the other, but both will allow passing standard arguments, like cores, chains, etc. to rstan.

stan_glm(
  y ~ x + z + (1|g), 
  data   = d, 
  family = binomial(link = "logit"), 
  prior  = normal(0, 1),
  prior_covariance = decov(regularization = 1, concentration = .5),
  QR     = TRUE,
  chains = 2,
  cores  = 2,
  iter   = 2000
)

brm(
  y ~ x + z + (1|g), 
  data   = d, 
  family = bernoulli(link = 'logit'), 
  prior  = prior(normal(0, 1), class = b, cauchy(0, 5), class = sd),
  chains = 2,
  cores  = 2, 
  iter   = 2000
)

So the syntax is easy to use for both of them, and to a point identical to standard R modeling syntax, and both have the same rstan arguments. However, you’ll need to know what’s available to tweak and how to do so specifically for each package.

Standard Regression and GLM

A good starting point for getting more comfortable with Bayesian analysis is to use it on what you’re already more comfortable with, e.g. the standard linear or generalized linear model, and rstanarm and brms both will do this for you. In general, for these models I would suggest rstanarm, as it will run much faster and is optimized for them.

It’s not a good thing that for the most common linear models R has multiple functions and even an additional packages. So we have the following for standard linear, glm, and categorical models:

aov: ANOVA
lm: standard regression (linear model)
glm: generalized linear model
MASS::glm.nb: negative binomial for count data
MASS::polr: ordinal regression model
nnet::nnet: multinomial regression model
biglm::biglm: big data lm

rstanarm keeps this nomenclature unfortunately, and currently doesn’t offer anything for multinomial models. Thus we have:

stan_aov: ANOVA
stan_lm: standard regression (linear model)
stan_glm: generalized linear model
stan_glm.nb: negative binomial for count data or neg_binomial_2 family for stan_glm
stan_polr: ordinal regression model
stan_biglm: big data lm

Contrast this with brms, which only requires the brm function and appropriate family, e.g. ‘poisson’ or ‘categorical,’ and which can do multinomial models also.

However, if you want to do a standard linear regression, I would probably not recommend using stan_lm, as it requires a prior for the \(R^2\), which is unfamiliar and only explained in technical ways that are likely going to be lost on those less comfortable with, or new to, statistical or Bayesian analysis⁵⁴. The good news is that you can simply run stan_glm instead, and work with the prior on the regression coefficients as we have discussed, and you can use bayes_R2 to get the \(R^2\).

You can certainly use brms for GLM, but it would have to compile the code first, and so will always be relatively, but not prohibitively, slower. For linear models with interactions or GLM generally, you may prefer it for the marginal effects plots and other additional functionality.

Categorical Models

If you’re just doing a standard logistic regression, I’d suggest stan_glm, again, for the speed. In addition, it has a specific model function for conditional logistic regression (stan_clogit). Beyond that, I’d recommend brms. For ordinal regression, stan_polr goes back to requiring a prior for \(R^2\), which is now the \(R^2\) for the underlying latent variable of the ordinal outcome⁵⁵. Furthermore, brms has some ordinal-specific plots, as well as other types of ordinal regression (e.g. adjacent category) that allow the proportional odds assumption to be relaxed. It also can do multi-category models.

brm(y ~ x, family = 'categorical')  # nominal outcome with > 2 levels

brm(y ~ cs(x), family = 'acat')     # ordinal model with category-specific effects for x

brms families for categorical:

bernoulli: binary target
categorical: nominal target
cumulative, sratio, cratio, and acat: ordinal outcome (cumulative, stopping ratio, continuation-ratio, adjacent category)

Extended Count Models

For going beyond binomial, poisson, and negative binomial distributions for count data, brms has a lot more for common extensions to those models, and beyond. It also has zero-altered counterparts to continuous outcomes (e.g. hurdle_gamma).

hurdle_poisson
hurdle_negbinomial
hurdle_gamma
hurdle_lognormal
zero_inflated_poisson
zero_inflated_negbinomial
zero_inflated_binomial
zero_inflated_beta
zero_one_inflated_beta

As mentioned previously, there is currently no direct way to do multinomial count models[^nomulti] except via the poisson

Mixed Models

The Bayesian approach really shines for mixed models in my opinion, where the random effects are estimated like other parameters, and so complicated structures are notably easier to deal with, and extending such models to other distribution families is straightforward. For the usual speed boost you can use rstanarm:

stan_lmer: standard lme4 style mixed model
stan_glmer: glmm
stan_glmer.nb: for negative binomial
stan_nlmer: nlme (but see stan_gamm4)
stan_mvmer: multivariate outcome
stan_gamm4: generalized additive mixed model in lme4 style

I would probably just recommend rstanarm for stan_lmer and stan_glmer, as brms has more flexibility, and even would be recommended for the standard models if you want to estimate residual (co-)variance structure, e.g. autocorrelation. It also will do multivariate models, and one can use mgcv::s for smooth terms in any brms model.

Other Models and Related

Along with all those rstanarm has specific functions for beta regression, joint mixed/survival models, and regularized linear regression. brms has many more distributional families, can do hypothesis testing[^], has marginal effects plots, and more. Both have plenty of tools for diagnostics, posterior predictive checks, and more of what has been discussed previously.

In general, rstanarm is a great tool for translating your standard models into Bayesian ones in an efficient fashion. On the other hand, brms uses a simplified syntax and is notably more flexible. Here is a brief summary of my recommend use for common regression models.

Analysis	rstanarm	brms
lm	√
glm	√
multinomial		√
ordinal		√
mixed	√	√
additive	√	√
regularized	√	√
beyond		√

Besides that, if you still need to model complexity not found within those, you can still use them to generate some highly optimized starter code, as they have functions for solely generating the underlying Stan code.

Even More Packages

I’ve focused on the two widely-used general-purpose packages, but nothing can stop Stan at this point. As of the latest update to this document, there are now 144 other packages depending on rstan in some way. There are also interfaces for Python, Julia, and more. Odds are good you’ll find one to suit your needs.

The developers note in their vignette for ANOVA:

‘but it is reasonable to expect a researcher to have a plausible guess for R2 before conducting an ANOVA.’

Actually, I’m not sure how reasonable this is. In consulting I’ve seen many, many researchers of varying levels of expertise, and I don’t think even a large minority of them would be able to hazard much of a guess about \(R^2\) before running a model, unless they’re essentially duplicating previous work. I also haven’t come across an explanation in the documentation (which is otherwise great) of how to specify it that would be very helpful to people just starting out with Bayesian analysis or even statistics in general. If the result is that one then has to try a bunch of different priors, then that becomes the focus of the analytical effort, which likely won’t appeal to people just wanting to run a standard regression model.↩︎
If someone tells me they know what the prior should be for that, I probably would not believe them.↩︎