class: center, middle, inverse, title-slide #
Mixed Models
##
with
R
!
###
Michael Clark
###
m-clark.github.io
@statsdatasci
CSCAR, UM
###
2020-10-21
--- class: inverse background-image: url(https://github.com/m-clark/m-clark.github.io/raw/master/img/Rlogo.svg)
--- class: inverse middle center ### **Overview of Mixed Models** ### *More Random Effects* ### *Common Extensions* ### *Issues* ### *Bayesian Approaches* --- class: inverse middle center animated rollIn rollOut # https://animate.style/ # Getting Started <br> <br>  --- class: inverse smaller50 # Getting Started .pull-left[ Required for workshop: ```r install.packages('lme4') ``` Others you have already: - <span class="pack" style = "">nlme</span> - included with base R - <span class="pack" style = "">mgcv</span> - included with base R ] .pull-right[ Other recommended (in general) - <span class="pack" style = "">glmmTMB</span> - <span class="pack" style = "">rstanarm</span> - Bayesian, requires <span class="pack" style = "">rstan</span> - <span class="pack" style = "">brms</span> - Bayesian, requires <span class="pack" style = "">rstan</span> Misc - <span class="pack" style = "">mixedup</span> - cleaner results for basic models - [https::github.com/m-clark/mixedup/](https::github.com/m-clark/mixedup/) - <span class="pack" style = "">lmerTest</span> - <span class="pack" style = "">merTools</span> ] --- class: inverse slide-font-66 # Getting Started: A starting point Run the following models and summarize. - <span class="objclass" style = "">`gapminder_recent_g20`</span> - country level data across time (GDP, life expectancy, etc.) - limited to > 1950 and countries in the g20 - Run and use <span class="func" style = "">summary</span> on the models. ```r load('data/gapminder_recent_g20.RData') ``` ```r library(lme4) mod_lm = lm(lifeExp ~ year0, data = gapminder_recent_g20) mod_lmer = lme4::lmer(lifeExp ~ year0 + (1|country), data = gapminder_recent_g20) ``` What's the same? What is different? --- class: img-slide <!-- # Getting Started: A starting point --> <img src="index_files/figure-html/unnamed-chunk-5-1.svg" width="125%" style="display: block; margin: auto;" /> --- class: img-slide <!-- # Getting Started: A starting point --> <img src="index_files/figure-html/unnamed-chunk-6-1.svg" width="125%" style="display: block; margin: auto;" /> --- class: inverse slide-font-66 # Getting Started: Terminology The following describe what we'd generally call: *<div class="center">mixed models</div>* - Variance components - Random intercepts and slopes - Random effects - Random coefficients - Varying coefficients - Intercepts- and/or slopes-as-outcomes - Hierarchical linear models - Multilevel models - Growth curve models - Mixed effects models --- class: inverse # Getting Started: Terminology The 'mixed' part comes from a mixture of: - *fixed effects* - *random effects* Perhaps a better way to phrase: - *general* - Applies to all observations - *specific* - Applies to a particular group --- class: inverse # Getting Started: Clustering Specific effects come from natural clustering - Repeated measures - Spatial/geography - Social/teams - Items on a test --- class: inverse # Getting Started: GPA Data Data Description ><span class="" style = "font-size: 75%">We'll assess factors predicting college grade point average (GPA). 200 students are assessed for six occasions (each semester for the first three years), so we have observations clustered within students. We have other variables such as job status, sex, and high school GPA. Some will be in both labeled and numeric form.</span> ```r library(tidyverse) load('data/gpa.RData') head(gpa) ``` ``` student occas gpa job sex highgpa admitted year semester occasion 1 1 year 1 semester 1 2.3 2 hours female 2.8 yes 1 1 0 2 1 year 1 semester 2 2.1 2 hours female 2.8 yes 1 2 1 3 1 year 2 semester 1 3.0 2 hours female 2.8 yes 2 1 2 4 1 year 2 semester 2 3.0 2 hours female 2.8 yes 2 2 3 5 1 year 3 semester 1 3.0 2 hours female 2.8 yes 3 1 4 6 1 year 3 semester 2 3.3 2 hours female 2.8 yes 3 2 5 ``` --- class: inverse # Getting Started: Standard Regression <div style = "text-align: center;">Scary formulas! <br><br> <span style = "font-size: 200%; ">👻</span></div> `$$\mathscr{gpa} = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathscr{occasion} + \epsilon$$` We have coefficients ( `\(b\)` ) for the intercept and the effect of time. The error ( `\(\epsilon\)` ) is assumed to be normally distributed with mean 0 and some standard deviation `\(\sigma\)`. `$$\epsilon \sim \mathcal{N}(0, \sigma)$$` --- class: inverse # Getting Started: Standard Regression Alternatively: `$$\mathscr{gpa} \sim \mathcal{N}(\mu, \sigma)$$` `$$\mu = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathscr{occasion}$$` --- class: inverse # Getting Started: Standard Regression Run it! ```r gpa_lm = lm(gpa ~ occasion, data = gpa) summary(gpa_lm) ``` ``` Call: lm(formula = gpa ~ occasion, data = gpa) Residuals: Min 1Q Median 3Q Max -0.90553 -0.22447 -0.01184 0.26921 1.19447 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2.599214 0.017846 145.65 <2e-16 *** occasion 0.106314 0.005894 18.04 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 0.3487 on 1198 degrees of freedom Multiple R-squared: 0.2136, Adjusted R-squared: 0.2129 F-statistic: 325.3 on 1 and 1198 DF, p-value: < 2.2e-16 ``` --- class: inverse # Getting Started: Single Random Effect But the observations are clustered - Not independent - Correlated within a cluster -- Regression by cluster? - Issues: - Not easily summarized with many groups - Often little data within each cluster - Overfit - Ignores what clusters have in common --- class: img-slide Think that standard model is adequate? <img src="index_files/figure-html/unnamed-chunk-10-1.svg" width="150%" style="display: block; margin: auto;" /> --- class: inverse # Getting Started: Mixed Model Add a (random) student effect: `$$\mathscr{gpa} = b_{\mathrm{intercept}} + b_{\mathrm{occ}}\cdot \mathscr{occasion} + (\mathrm{effect}_{\mathscr{student}} + \epsilon)$$` Assume the following distribution for it: `$$\mathrm{effect}_{\mathrm{student}} \sim \mathcal{N}(0, \tau)$$` --- class: inverse # Getting Started: Mixed Model Rearrange it! `$$\mathscr{gpa} = (b_{\mathrm{intercept}} + \mathrm{effect}_{\mathscr{student}}) + b_{\mathrm{occ}}\cdot \mathscr{occasion} + \epsilon$$` `$$\mathscr{gpa} = b_{\mathrm{int\_student}} + b_{\mathrm{occ}}\cdot \mathscr{occasion} + \epsilon$$` `$$b_{\mathrm{int\_student}} \sim \mathcal{N}(b_{\mathrm{intercept}}, \tau)$$` <br> Can now focus on model coefficients - Rather than as another source of 'error' --- class: inverse slide-font-66 # Getting Started: Mixed Model Another way of showing the mixed model: *<div style = "text-align:center">multilevel modeling</div>* Two part regression model - one at the observation level - one at the student level `$$\mathrm{gpa} = b_{\mathrm{int\_student}} + b_{\mathrm{occ}}\cdot \mathrm{occasion} + \epsilon$$` `$$b_{\mathrm{int\_student}} = b_{\mathrm{intercept}} + \mathrm{effect}_{\mathrm{student}}$$` After 'plugging in' the second level part to the first, the model is identical to the previous. --- class: img-slide # Getting Started: Application <img src="index_files/figure-html/spaghetti-1.svg" width="75%" style="display: block; margin: auto;" /> --- class: inverse code-only Add a student effect! ```r gpa_mixed = lmer(gpa ~ occasion + (1 | student), data = gpa) # summary(gpa_mixed, cor = FALSE) # dont' want FE correlations mixedup::summarize_model(gpa_mixed, digits = 3) ``` ``` Computing profile confidence intervals ... ``` ``` Variance Components: ``` ``` Group Effect Variance SD SD_2.5 SD_97.5 Var_prop student Intercept 0.064 0.252 0.225 0.282 0.523 Residual 0.058 0.241 0.231 0.252 0.477 ``` ``` Fixed Effects: ``` ``` Term Value SE t P_value Lower_2.5 Upper_97.5 Intercept 2.599 0.022 119.800 0.000 2.557 2.642 occasion 0.106 0.004 26.096 0.000 0.098 0.114 ``` --- class: inverse # Getting Started: Application Fixed effects: ```r # lme4::fixef(gpa_mixed) mixedup::extract_fixed_effects(gpa_mixed) ``` ``` # A tibble: 2 x 7 term value se t p_value lower_2.5 upper_97.5 <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> 1 Intercept 2.60 0.022 120. 0 2.56 2.64 2 occasion 0.106 0.004 26.1 0 0.098 0.114 ``` Just our standard regression coefficients! - Interpreted the same --- class: inverse # Getting Started: Application Variance components: ```r # lme4::VarCorr(gpa_mixed) mixedup::extract_vc(gpa_mixed) ``` ``` group effect variance sd sd_2.5 sd_97.5 var_prop 1 student Intercept 0.064 0.252 0.225 0.282 0.523 2 Residual 0.058 0.241 0.231 0.252 0.477 ``` These are the standard deviation/variance of the random effects. The residual SD is akin to that in the `lm` result --- class: inverse slide-font-75 # Getting Started: Application ``` group effect variance sd sd_2.5 sd_97.5 var_prop 1 student Intercept 0.064 0.252 0.225 0.282 0.523 2 Residual 0.058 0.241 0.231 0.252 0.477 ``` The proportion of variance attributable to students: ~52.3% Can also be interpreted as a correlation. - Correlation of observations w/in a student - Called the *intraclass correlation* - Assumed constant across observations We'll come back to this... <!-- <img src="img/back.jpg" style="display:inline; margin: 0 auto; width:10%"> --> --- class: inverse # Getting Started: Application These are the random effects for students: ```r # lme4::ranef(gpa_mixed) mixedup::extract_random_effects(gpa_mixed) ``` ``` # A tibble: 200 x 7 group_var effect group value se lower_2.5 upper_97.5 <chr> <chr> <fct> <dbl> <dbl> <dbl> <dbl> 1 student Intercept 1 -0.071 0.092 -0.251 0.109 2 student Intercept 2 -0.216 0.092 -0.395 -0.036 3 student Intercept 3 0.088 0.092 -0.091 0.268 4 student Intercept 4 -0.187 0.092 -0.366 -0.007 5 student Intercept 5 0.03 0.092 -0.149 0.21 6 student Intercept 6 -0.302 0.092 -0.482 -0.123 7 student Intercept 7 -0.143 0.092 -0.323 0.036 8 student Intercept 8 0.218 0.092 0.039 0.398 9 student Intercept 9 0.103 0.092 -0.077 0.282 10 student Intercept 10 0.016 0.092 -0.164 0.196 # … with 190 more rows ``` --- class: inverse # Getting Started: Application These are the random coefficients: - In this case, intercepts - coef = intercept + effect ```r # coef(gpa_mixed) mixedup::extract_random_coefs(gpa_mixed) ``` ``` # A tibble: 200 x 7 group_var effect group value se lower_2.5 upper_97.5 <chr> <chr> <fct> <dbl> <dbl> <dbl> <dbl> 1 student Intercept 1 2.53 0.095 2.34 2.71 2 student Intercept 2 2.38 0.095 2.20 2.57 3 student Intercept 3 2.69 0.095 2.50 2.87 4 student Intercept 4 2.41 0.095 2.23 2.60 5 student Intercept 5 2.63 0.095 2.44 2.81 6 student Intercept 6 2.30 0.095 2.11 2.48 7 student Intercept 7 2.46 0.095 2.27 2.64 8 student Intercept 8 2.82 0.095 2.63 3.00 9 student Intercept 9 2.70 0.095 2.52 2.89 10 student Intercept 10 2.62 0.095 2.43 2.8 # … with 190 more rows ``` --- class: inverse # Getting Started: Application Prediction can include random effects or not. If they do not, essentially the same as `lm`. But the random effects will help produce better predictions. 😃 ```r predict_lm = predict(gpa_lm) predict_no_re = predict(gpa_mixed, re.form=NA) predict_with_re = predict(gpa_mixed) ``` --- class: img-slide <table class="table" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:left;"> student </th> <th style="text-align:right;"> lm </th> <th style="text-align:right;"> lmer no re </th> <th style="text-align:right;"> lmer with re </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 2.60 </td> <td style="text-align:right;"> 2.60 </td> <td style="text-align:right;"> 2.53 </td> </tr> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 2.71 </td> <td style="text-align:right;"> 2.71 </td> <td style="text-align:right;"> 2.63 </td> </tr> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> 2.74 </td> </tr> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 2.92 </td> <td style="text-align:right;"> 2.92 </td> <td style="text-align:right;"> 2.85 </td> </tr> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 3.02 </td> <td style="text-align:right;"> 3.02 </td> <td style="text-align:right;"> 2.95 </td> </tr> <tr> <td style="text-align:left;"> 1 </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 3.06 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 2.60 </td> <td style="text-align:right;"> 2.60 </td> <td style="text-align:right;"> 2.38 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 2.71 </td> <td style="text-align:right;"> 2.71 </td> <td style="text-align:right;"> 2.49 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> 2.60 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 2.92 </td> <td style="text-align:right;"> 2.92 </td> <td style="text-align:right;"> 2.70 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 3.02 </td> <td style="text-align:right;"> 3.02 </td> <td style="text-align:right;"> 2.81 </td> </tr> <tr> <td style="text-align:left;"> 2 </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 2.92 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 2.60 </td> <td style="text-align:right;"> 2.60 </td> <td style="text-align:right;"> 2.69 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 2.71 </td> <td style="text-align:right;"> 2.71 </td> <td style="text-align:right;"> 2.79 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> 2.90 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 2.92 </td> <td style="text-align:right;"> 2.92 </td> <td style="text-align:right;"> 3.01 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 3.02 </td> <td style="text-align:right;"> 3.02 </td> <td style="text-align:right;"> 3.11 </td> </tr> <tr> <td style="text-align:left;"> 3 </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 3.22 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 2.60 </td> <td style="text-align:right;"> 2.60 </td> <td style="text-align:right;"> 2.41 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 2.71 </td> <td style="text-align:right;"> 2.71 </td> <td style="text-align:right;"> 2.52 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> 2.63 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 2.92 </td> <td style="text-align:right;"> 2.92 </td> <td style="text-align:right;"> 2.73 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 3.02 </td> <td style="text-align:right;"> 3.02 </td> <td style="text-align:right;"> 2.84 </td> </tr> <tr> <td style="text-align:left;"> 4 </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 2.94 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 2.60 </td> <td style="text-align:right;"> 2.60 </td> <td style="text-align:right;"> 2.63 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 2.71 </td> <td style="text-align:right;"> 2.71 </td> <td style="text-align:right;"> 2.74 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> 2.81 </td> <td style="text-align:right;"> 2.84 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 2.92 </td> <td style="text-align:right;"> 2.92 </td> <td style="text-align:right;"> 2.95 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 3.02 </td> <td style="text-align:right;"> 3.02 </td> <td style="text-align:right;"> 3.05 </td> </tr> <tr> <td style="text-align:left;"> 5 </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 3.13 </td> <td style="text-align:right;"> 3.16 </td> </tr> </tbody> </table> --- class: img-slide slide-font-75 # Getting Started: Application Which do you think has the better fit? - One line for all? Separate for each student? <img src="index_files/figure-html/unnamed-chunk-19-1.svg" width="80%" style="display: block; margin: auto;" /> --- class: inverse # Getting Started: Cluster level covariates Thus far we've had only one covariate: - Time indicator - Varies with outcome, at observation level We can have cluster-level covariates also - Just add as you would any other ```r lmer(gpa ~ occasion + sex + (1|student), gpa) ``` --- class: inverse # Getting Started: Exercises ### Sleep For this exercise, we'll use the sleep study data from the <span class="pack">lme4</span> package. The following describes it. > The average reaction time per day for subjects in a sleep deprivation study. On day 0 the subjects had their normal amount of sleep. Starting that night they were restricted to 3 hours of sleep per night. The observations represent the average reaction time (in milliseconds) on a series of tests given each day to each subject. --- class: inverse # Getting Started: Exercises After loading the package, the data can be loaded as follows. ```r library(lme4) data("sleepstudy") ``` 1. Run a regression with Reaction as the target variable and Days as the predictor. 2. Run a mixed model with a random intercept for Subject. 3. Interpret the variance components and fixed effects. --- class: inverse # Getting Started: Exercises Rerun the mixed model with the GPA data adding the cluster level covariate of `sex`, or high school GPA (`highgpa`), or both. Interpret all aspects of the results. What happened to the student variance after adding cluster level covariates to the model? --- class: inverse middle center # Moving on... # [Part 2](https://m-clark.github.io/mixed-models-with-R-workshop/part-2.html) <span class="" style = "width: 50px; display: inline-block"></span>[Part 3](https://m-clark.github.io/mixed-models-with-R-workshop/part-3.html) <span class="" style = "width: 50px; display: inline-block"></span>[Part 4](https://m-clark.github.io/mixed-models-with-R-workshop/part-4.html) <span class="" style = "width: 50px; display: inline-block"></span>[Part 5](https://m-clark.github.io/mixed-models-with-R-workshop/part-5.html)