R Packages

Here I will go into a bit of detail regarding rstanarm and brms. For standard models, these should be your first choice, rather than using Stan directly. Why? For one, the underlying code that is used will be more optimized and efficient than what you come up with, and has had multiple individuals developing that code and hundreds actually using it. Furthermore, you can still, and probably should, set your priors as you wish.

The nice thing about both is that you use the same syntax that you do for R modeling in general. Here is a a basic GLM in both.

And here are a couple complexities thrown in to show some minor differences. For example, the priors are specified a bit differently, and you may have options for one that you won’t have in the other, but both will allow passing standard arguments, like cores, chains, etc. to rstan.

So the syntax is easy to use for both of them, and to a point identical to standard R modeling syntax, and both have the same rstan arguments. However, you’ll need to know what’s available to tweak and how to do so specifically for each package.

Standard Regression and GLM

A good starting point for getting more comfortable with Bayesian analysis is to use it on what you’re already more comfortable with, e.g. the standard linear or generalized linear model, and rstanarm and brms both will do this for you. In general, for these models I would suggest rstanarm, as it will run much faster and is optimized for them.

It’s not a good thing that for the most common linear models R has multiple functions and even an additional packages. So we have the following for standard linear, glm, and categorical models:

  • aov: ANOVA
  • lm: standard regression (linear model)
  • glm: generalized linear model
  • MASS::glm.nb: negative binomial for count data
  • MASS::polr: ordinal regression model
  • nnet::nnet: multinomial regression model
  • biglm::biglm: big data lm

rstanarm keeps this nomenclature unfortunately, and currently doesn’t offer anything for multinomial models. Thus we have:

  • stan_aov: ANOVA
  • stan_lm: standard regression (linear model)
  • stan_glm: generalized linear model
  • stan_glm.nb: negative binomial for count data or neg_binomial_2 family for stan_glm
  • stan_polr: ordinal regression model
  • stan_biglm: big data lm

Contrast this with brms, which only requires the brm function and appropriate family, e.g. ‘poisson’ or ‘categorical’, and which can do multinomial models also.

However, if you want to do a standard linear regression, I would not recommend using stan_lm, as it requires a prior for the \(R^2\), which is unfamiliar and only explained in technical ways that are likely going to be lost on those less comfortable with or new to statistical or Bayesian analysis54. The good news is that you can simply run stan_glm instead, and work with the prior on the regression coefficients as we have discussed, and you can use bayes_R2 to get the \(R^2\).

You can certainly use brms for GLM, but it would have to compile the code and so will always be notably slower. For LM with interactions or GLM generally, you may prefer it for the marginal effects plots.

Categorical Models

If you’re just doing a standard logistic regression, I’d suggest stan_glm, again, for the speed. In addition, it has a specific model function for conditional logistic regression (stan_clogit). Beyond that, I’d probably recommend brms. For ordinal regression, stan_polr goes back to requiring a prior for \(R^2\), which is now the \(R^2\) for the underlying latent variable of the ordinal outcome55. Furthermore, brms has some ordinal-specific plots, as well as other types of ordinal regression (e.g. adjacent category) that allow the proportional odds assumption to be relaxed. It also can do multi-category models56.

brms families for categorical:

  • bernoulli: binary target
  • categorical: nominal target
  • cumulative, sratio, cratio, and acat: ordinal outcome (cumulative, stopping ratio, continuation-ratio, adjacent category)

Extended Count Models

For going beyond binomial, poisson, and negative binomial distributions for count data, brms has a lot more for common extensions to those models, and beyond. It also has zero-altered counterparts to continuous outcomes (e.g. hurdle_gamma).

  • hurdle_poisson
  • hurdle_negbinomial
  • hurdle_gamma
  • hurdle_lognormal
  • zero_inflated_poisson
  • zero_inflated_negbinomial
  • zero_inflated_binomial
  • zero_inflated_beta
  • zero_one_inflated_beta

As mentioned previously, there is currently no direct way to do multinomial count models57 except via the poisson

Mixed Models

The Bayesian approach really shines for mixed models in my opinion, where the random effects are estimated like other parameters, and so complicated structures are notably easier to deal with, and extending such models to other distribution families is straightforward. For the usual speed boost you can use rstanarm:

  • stan_lmer: standard lme4 style mixed model
  • stan_glmer: glmm
  • stan_glmer.nb: for negative binomial
  • stan_nlmer: nlme (but see stan_gamm4)
  • stan_mvmer: multivariate outcome
  • stan_gamm4: generalized additive mixed model in lme4 style

I would probably just recommend rstanarm for stan_lmer and stan_glmer, as brms has more flexibility, and even would be recommended for the standard models if you want to estimate residual (co-)variance structure, e.g. autocorrelation. It also will do multivariate models, and one can use mgcv::s for smooth terms in any brms model.

Even More Packages

I’ve focused on the two widely-used general-purpose packages, but nothing can stop Stan at this point. Here is a visualization of the current rstan ecosystem.

At this point there are already a couple dozen packages working with Stan under the hood. Odds are good you’ll find one to suit your needs.


  1. The developers note in their vignette for stan_aov:

    ‘but it is reasonable to expect a researcher to have a plausible guess for R2 before conducting an ANOVA.’

    Actually, I’m not sure how reasonable this is. I see many, many researchers of varying levels of expertise, and I don’t think any of them would be able to hazard much of a guess about \(R^2\) before running a model, unless they’re essentially duplicating previous work. I also haven’t come across an explanation in the documentation (which is otherwise great) of how to specify it that would be very helpful to people just starting out with Bayesian analysis or even statistics in general. If the result is that one then has to try a bunch of different priors, then that becomes the focus of the analytical effort, which likely won’t appeal to people just wanting to run a standard regression model.

  2. If someone tells me they know what the prior should be for that, I probably would not believe them.

  3. The corresponding distribution is the categorical distribution, which is a multinomial distribution with size = 1. Multinomial count models, i.e. with size > 1, on the other hand, are not currently supported except indirectly. However, the multinomial-poisson transformation can be used instead.

  4. The corresponding distribution is the categorical distribution, which is a multinomial distribution with size = 1. Multinomial count models, i.e. with size > 1, on the other hand, are not currently supported except indirectly. However, the multinomial-poisson transformation can be used instead.