Traditional GLM
library(tidyverse)
attendance = haven::read_dta("https://stats.idre.ucla.edu/stat/stata/dae/nb_data.dta")
attendance <- attendance %>%
mutate(
prog = factor(prog, levels = 1:3, labels = c("General", "Academic", "Vocational")),
prog = fct_relevel(prog, c('Vocational', 'General', 'Academic')),
gender = factor(gender, labels = c('Female', 'Male')),
id = factor(id)
)
We’ll use Poisson regression2 to model the count of the number of days absent
attendance_glm <- glm(daysabs ~ math + gender + prog,
data = attendance,
family = poisson)
## summary(attendance_glm)
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | 1.489 | 0.081 | 18.302 | 0 |
math | -0.007 | 0.001 | -7.437 | 0 |
genderMale | -0.242 | 0.047 | -5.184 | 0 |
progGeneral | 1.271 | 0.078 | 16.309 | 0 |
progAcademic | 0.845 | 0.068 | 12.450 | 0 |
If not familiar with Poisson regression, we are modeling the log counts as a function of the covariates. Often the exponentiated coefficients are reported. For example,
exp(coef(attendance_glm)['genderMale'])
is 0.785. Subtracting 1 tells us there is a -21.5% decrease in the incident rate of days absent as we switch from female to male.↩