Preface

The following serves as a practical and applied introduction to Bayesian estimation methods for the uninitiated. The goal is to provide just enough information in a brief format to allow one to feel comfortable exploring Bayesian data analysis for themselves, assuming they have the requisite context to begin with. The idea is to cover a similar amount of material as one would in part of a standard statistics sequence in various applied disciplines where statistics is being introduced in general.

After a conceptual introduction, a fully visible by-hand example is provided using the binomial distribution. After that, the document proceeds to introduce fully Bayesian analysis with the standard linear regression model, as that is the basis for most applied statistics courses and is assumed to be most familiar to the reader. Model diagnostics, model enhancements, and additional modeling issues are then explored. Supplemental content in the appendix provides more technical detail if desired, and includes a maximum likelihood refresher, an overview of programming options in Bayesian analysis, the same regression model using BUGS and JAGS, and ‘by-hand’ code for the model using the Metropolis-Hastings and Hamiltonian Monte Carlo algorithms.

Prerequisites

Prerequisites include a basic statistical exposure such as what would be covered in a typical (probably graduate) applied science statistics course. At least some familiarity with R is necessary to follow the code, but that itself is not necessary, and one may go through any number of introductions on the web to acquire enough knowledge in that respect. However, note that for the examples here, at least part of the code will employ some Bayesian-specific programming language (e.g. Stan primarily, BUGS and JAGS in the appendix). No attempt is made to teach those languages though, as it would be difficult to do so efficiently in this more conceptually oriented setting. As such, it is suggested that one follow the code as best they can, and investigate the respective manuals, relevant texts, etc. further on their own. Between the text and comments within the code it is hoped that what the code is accomplishing will be fairly clear.

This document relies heavily on Gelman et al. (2013), which I highly recommend, if one is ready for it. Other sources used, or particularly pertinent to the material for this document, can be found in the references section at the end. Some are more introductory, and which might be more suitable depending on the context you bring.

Color coding:

  • emphasis
  • package
  • function
  • object/class
  • link

Going Further

More recently, I offered a two-part practical guide to Bayesian analysis, which I would suggest would be your next step to quickly getting started with models and easy to use tools. I also have a set of old workshop notes that can serve as an overview of Stan and Stan-related packages here and here. There may be additional information and/or exercises that could still be of use. Beyond my own stuff, there are plenty of opportunities to learn Bayesian analysis in the R world, so don’t be afraid to just use something you find that you like.

Note

This document focuses more on concepts and teaching the Bayesian approach to modeling, while using Stan more as a practical vehicle. However, I do want to say something about the development of Stan and the document itself. Since this document was first put together in Summer of 2014, Stan and associated packages have undergone vast and continued development, and tools for publishing documents with R have as well. Packages such as rstanarm and brms didn’t exist then, but now make even some fairly complicated models easy to pull off with just a line or two of standard R code, and without ever having to use the Stan language directly. Further, newer visualization tools such as bayesplot, tidybayes, and shinystan make model exploration even easier than before. While the document has been updated periodically and so reflects some of these developments, it still leans on conducting analysis more explicitly so that things are not so much of a black box. However, it’s worth noting that the example regression model within and associated diagnostics and visuals would now only take a few lines of R code rather than Stan. In short, the combination of R and Stan make Bayesian analysis easier than ever before. Once armed with the basic concepts, you should feel able to dive in as easily as you would with any other R modeling tool.

About the Author

I’ve done statistical consulting, conducted workshops, and engaged in scientific research at three universities- University of North Texas, Notre Dame, and the University of Michigan. I am currently a data scientist at Strong Analytics, where I continue to explore and analyze data in all sorts of ways. Writing documents like this helped me learn the techniques I was interested in, and I hope they help you as well! If you have any questions, feel free to reach out.


References

Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis. 3rd ed. http://www.stat.columbia.edu/~gelman/book/.