rstanarm: GLM
rstanarm uses the same nomenclature and general approach as base R
library(rstanarm)
attendance_bglm <- stan_glm(daysabs ~ math + gender + prog,
data = attendance,
family = poisson)
summary(attendance_bglm, digits = 2, prob=c(.025, .5, .975))
Model Info:
function: stan_glm
family: poisson [log]
formula: daysabs ~ math + gender + prog
algorithm: sampling
priors: see help('prior_summary')
sample: 4000 (posterior sample size)
observations: 314
predictors: 5
Estimates:
mean sd 50% 2.5% 97.5%
(Intercept) 1.49 0.08 1.49 1.33 1.65
math -0.01 0.00 -0.01 -0.01 -0.01
genderMale -0.24 0.05 -0.24 -0.33 -0.15
progGeneral 1.27 0.08 1.27 1.12 1.42
progAcademic 0.84 0.07 0.84 0.71 0.98
mean_PPD 5.95 0.20 5.96 5.57 6.33
log-posterior -1324.70 1.57 -1324.39 -1328.55 -1322.59
Diagnostics:
mcse Rhat n_eff
(Intercept) 0.00 1.00 1862
math 0.00 1.00 3255
genderMale 0.00 1.00 3474
progGeneral 0.00 1.00 1845
progAcademic 0.00 1.00 1758
mean_PPD 0.00 1.00 3914
log-posterior 0.04 1.00 1994
For each parameter, mcse is Monte Carlo standard error, n_eff is a crude measure of effective sample size, and Rhat is the potential scale reduction factor on split chains (at convergence Rhat=1).
Summary Info:
This is the same as you see in every other regression model:
- mean: the point estimate for the parameter
- sd: standard error for the point estimate
- quantiles: are whatever you want, but here represent the median and 95% uncertainty inteval
Additional:
- mean_PPD: mean of the posterior predictive distribution (hopefully on par with the mean of the target variable (
daysabs
)) - log-posterior: similar to the log-likelihood from maximum likelihood, but for the Bayesian case
Diagnostics for quick eyeball inspection:
- Monte Carlo Standard Error: The standard error of the mean of the posterior draws. Want mcse than 10% of the posterior standard deviation.
- \(n_{eff}\): is an estimate of the effective number of independent draws from the posterior distribution of the estimand of interest. Because the draws within a chain are not independent if there is autocorrelation, the effective sample size will be smaller than the total number of iterations. Should be greater than 10% of max.
- \(\hat{R}\): measures the ratio of the average variance of samples within each chain to the variance of the pooled samples across chains; if all chains are at equilibrium, these will be the same and R̂ will be one. Desire less than 1.1.
Adding more options
Typical configuration would involve setting priors, as well as MCMC options such as iterations, warm-up, etc.