# Common Extensions

## Additional Grouping Structure

### Cross-classified models

Oftentimes there will be additional sources of variance beyond one grouping factor. Consider as an example, a visual perception experiment where there are multiple trials for each individual, along with specific images displayed. Such data might look like this.

In these situations, we have observations clustered within both person and image, but person and image are not nested within one another- all participants see all 10 images. Such a situation is typically referred to as one in which there are crossed random effects, which just means non-nested. In the situations we’ll look at next, we will have multiple sources variances to consider.

#### Example: Student achievement

For our own demonstration we’ll look at achievement scores for students. The sources of dependency are due to students having gone to the same primary or secondary schools. However, in this example, going to a primary school doesn’t necessarily mean you’ll go to a specific secondary school. Note also that there are no repeated measures, we see each student only once. Here’s a quick look a the data, and for more detail, check the appendix.

`load('data/pupils.RData')`

For our mixed model we’ll look at the effects for `sex`

and socioeconomic status, `ses`

, a six level variable from low to high, on scholastic achievement. The range of achievement scores is roughly 4 to 10, with mean of 6.3 and standard deviation 0.9. We’ll take into account the clustering at primary school and secondary school. To incorporate the additional structure in lme4 syntax is very easy, we just do as we did before, though now for both grouping factors^{17}.

```
= lmer(
pupils_crossed ~ sex + ses
achievement + (1|primary_school_id) + (1|secondary_school_id),
data = pupils
)##
## summary(pupils_crossed, correlation = FALSE)
```

term | value | se | lower_2.5 | upper_97.5 |
---|---|---|---|---|

Intercept | 5.924 | 0.123 | 5.684 | 6.164 |

sexfemale | 0.261 | 0.046 | 0.171 | 0.350 |

ses2 | 0.132 | 0.118 | -0.098 | 0.362 |

ses3 | 0.098 | 0.110 | -0.118 | 0.314 |

ses4 | 0.298 | 0.105 | 0.093 | 0.503 |

ses5 | 0.354 | 0.101 | 0.156 | 0.551 |

seshighest | 0.616 | 0.110 | 0.401 | 0.832 |

The fixed effects tell us there is a positive effect of being female on achievement, and in general, relative to lowest SES category, being in the upper categories of SES also has a positive effect.

group | effect | variance | sd | var_prop |
---|---|---|---|---|

primary_school_id | Intercept | 0.17 | 0.42 | 0.24 |

secondary_school_id | Intercept | 0.07 | 0.26 | 0.09 |

Residual | 0.47 | 0.69 | 0.66 |

When we look at the variance components we see that primary and secondary school contributes about 34% of the total variance. Most of the variance attributable to school comes from the primary school.

If we inspect the random effects, we can see that we now have two sets of effects- 50 for the primary schools, and 30 for the secondary. Both would be incorporated into any pupil-specific prediction.

`glimpse(ranef(pupils_crossed))`

```
List of 2
$ primary_school_id :'data.frame': 50 obs. of 1 variable:
..$ (Intercept): num [1:50] -0.327 0.183 0.52 0.474 0.253 ...
..- attr(*, "postVar")= num [1, 1, 1:50] 0.0247 0.0225 0.0244 0.0215 0.0285 ...
$ secondary_school_id:'data.frame': 30 obs. of 1 variable:
..$ (Intercept): num [1:30] -0.41069 0.08247 -0.00589 -0.06162 0.08481 ...
..- attr(*, "postVar")= num [1, 1, 1:30] 0.0155 0.014 0.0159 0.0131 0.0142 ...
- attr(*, "class")= chr "ranef.mer"
```

Let’s look at them visually using merTools.

Note that we have the usual extensions here if desired. As an example, we could also do random slopes for student level characteristics.

### Hierarchical structure

Now that we have looked at cross-classified models, we can proceed to examine hierarchical cluster structuring. In this situation we have clusters nested within other clusters, which may be nested within still other clusters. A typical example might be cities within counties, and counties within states.

#### Example: Nurses and stress

For our demonstration we’ll use the nurses data set. Here we are interested in the effect of a training program (`treatment`

) on stress levels (on a scale of 1-7) of nurses. In this scenario, nurses are nested within wards, which themselves are nested within hospitals, so we will have random effects pertaining to ward (within hospital) and hospital. For more information see the appendix.

`load('data/nurses.RData')`

For the model we examine effects of the treatment as well as several other covariates, at least one at each of the nurse, ward, and hospital levels. Again, when it comes to the fixed effects portion, you can simply think about that part as you would any standard regression, we just add covariates as theory/exploration would suggest. To incorporate this type of random effects structure is not too different from the cross-classified approach, but does have a slight change to the syntax.

```
= lmer(
nurses_hierarchical ~ age + sex + experience + treatment + wardtype + hospsize
stress + (1 | hospital) + (1 | hospital:ward),
data = nurses
)##
## # same thing!
## nurses_hierarchical = lmer(
## stress ~ age + sex + experience + treatment + wardtype + hospsize
## + (1|hospital/ward),
## data = nurses
## )
##
## summary(nurses_hierarchical, correlation = F)
```

term | value | se | lower_2.5 | upper_97.5 |
---|---|---|---|---|

Intercept | 5.380 | 0.185 | 5.018 | 5.742 |

age | 0.022 | 0.002 | 0.018 | 0.026 |

sexFemale | -0.453 | 0.035 | -0.522 | -0.385 |

experience | -0.062 | 0.004 | -0.070 | -0.053 |

treatmentTraining | -0.700 | 0.120 | -0.935 | -0.465 |

wardtypespecial care | 0.051 | 0.120 | -0.184 | 0.286 |

hospsizemedium | 0.489 | 0.202 | 0.094 | 0.884 |

hospsizelarge | 0.902 | 0.275 | 0.363 | 1.440 |

As far as the fixed effects go, about the only thing that doesn’t have a statistical effect is ward type^{18}.

group | effect | variance | sd | var_prop |
---|---|---|---|---|

hospital:ward | Intercept | 0.337 | 0.580 | 0.500 |

hospital | Intercept | 0.119 | 0.345 | 0.177 |

Residual | 0.217 | 0.466 | 0.323 |

Concerning the random effects, there appears to be quite a bit of variability from ward to ward especially, but also hospital. Recall that stress is a 7 point scale, so from ward to ward we can expect scores to bounce around about half a point on average, which is quite dramatic in my opinion. Again we inspect it visually.

### Crossed vs. nested

The following shows the difference in the results from treating ward as a nested (within hospital) vs. crossed random effect. What do you notice is different?

```
= lmer(
nurses_hierarchical ~ age + sex + experience + treatment + wardtype + hospsize
stress + (1|hospital) + (1|hospital:wardid),
data = nurses
)
= lmer(
nurses_crossed ~ age + sex + experience + treatment + wardtype + hospsize
stress + (1|hospital) + (1|wardid),
data = nurses
)
```

group | effect | variance | sd | var_prop |
---|---|---|---|---|

hospital:ward | Intercept | 0.337 | 0.580 | 0.500 |

hospital | Intercept | 0.119 | 0.345 | 0.177 |

Residual | 0.217 | 0.466 | 0.323 |

group | effect | variance | sd | var_prop |
---|---|---|---|---|

wardid | Intercept | 0.337 | 0.580 | 0.500 |

hospital | Intercept | 0.119 | 0.345 | 0.177 |

Residual | 0.217 | 0.466 | 0.323 |

Nothing? Good, you’re not crazy. Here’s a quote from the lme4 text, section 2.2.1.1, which is definitely worth your time.

The blurring of mixed-effects models with the concept of multiple, hierarchical levels of variation results in an unwarranted emphasis on ‘levels’ when defining a model and leads to considerable confusion. It is perfectly legitimate to define models having random effects associated with non-nested factors. The reasons for the emphasis on defining random effects with respect to nested factors only are that such cases do occur frequently in practice, and that some of the computational methods for estimating the parameters in the models can only be easily applied to nested factors.

This is not the case for the methods used in the lme4 package.

Indeed there is nothing special done for models with random effects for nested factors. When random effects are associated with multiple factors, exactly the same computational methods are used whether the factors form a nested sequence or are partially crossed or are completely crossed.

You might have noticed that we were using `wardid`

rather than the `ward`

grouping variable as in our first example. Even though every ward is unique, the `ward`

column labels them with an arbitrary sequence starting with 1. While this might seem natural, ward 1 in hospital 1 is not the same as ward 1 in hospital 2, so it’s probably not a good idea to give them the same label. The `wardid`

column properly distinguishes the wards with unique values (e.g. 11, 12).

What would have happened had we used that variable as a crossed random effect?

```
= lmer(
nurses_crossed_bad_data ~ age + sex + experience + treatment + wardtype + hospsize
stress + (1|hospital) + (1|ward),
data = nurses
)
```

group | effect | variance | sd | var_prop |
---|---|---|---|---|

hospital | Intercept | 0.196 | 0.442 | 0.297 |

ward | Intercept | 0.000 | 0.000 | 0.000 |

Residual | 0.463 | 0.681 | 0.703 |

This is certainly not the result we want. The variance in `ward`

is already captured by treatment and type. However, as demonstrated, this can be avoided with the proper syntax, or proper labeling in the data to allow unique clusters to have unique identifiers.

See this discussion also, as well as this from the FAQ from one of the lme4 developers. Josh Errickson at CSCAR also has a nice write-up with visual depiction of the underlying matrices of interest, which served as inspiration for some of the visualization in the next section.

So there you have it. When it comes to lme4, or mixed models more generally, crossed vs. nested is simply a state of mind (data)^{19}.

## Residual Structure

Sometimes we will want to obtain more specific estimates regarding the residual covariance/correlation structure. This is especially the case in the longitudinal setting, where we think that observations closer in time would be more strongly correlated than those further apart, or that the variance changes over time. What does this model look like?

Let’s begin by thinking about the covariance/correlation matrix for the entire set of observations for our target variable, and how we want to represent the dependency in those observations. I’ll show a visualization of the first 5 people from our GPA data and modeling situation. Recall that each person has 6 observations. This display regards the results from our random intercepts (only) model for GPA.

Each block represents the covariance matrix pertaining to observations within an individual. Within the person there are variances on the diagonal and covariances on the off-diagonal. When considering the whole data, we can see that observations from one person have no covariance with another person (gray). Furthermore, the covariance within a person is a constant value, the variance is also a constant value. Where did those values come from?

group | effect | variance | sd | var_prop |
---|---|---|---|---|

student | Intercept | 0.064 | 0.252 | 0.523 |

Residual | 0.058 | 0.241 | 0.477 |

Remember that there are two sources of variance in this model, the residual observation level variance, and that pertaining to person. Combined they provide the total residual variance that we aren’t already capturing with our covariates. In this case, it’s about 0.12, the value displayed on our diagonal. The off-diagonal is the variance attributable to student, which we alternately interpreted as an intraclass correlation (dividing by the total variance converts it to the correlation metric).

More generically, and referring to previous notation for our estimated variances, we can see the covariance matrix (for a cluster) as follows.

\[\Sigma =
\left[
\begin{array}{ccc}
\color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2 & \tau^2 & \tau^2 \\
\tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2 & \tau^2 \\
\tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2 \\
\tau^2 & \tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2\\
\tau^2 & \tau^2 & \tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 \\
\tau^2 & \tau^2 & \tau^2 & \tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} \\
\end{array}\right]\]

This represents a covariance structure of compound symmetry. It is the default in most mixed model settings, and the same as what is shown visually above. Now let’s start to think about other types of covariance structures.

Consider the following model for an individual and just three time points to keep things simpler to show.

\[\boldsymbol{y} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\]

So we have three observations of \(y\) that are multivariate normally distributed. The mean \(\mu\) is a function of covariates just like in standard regression.

\[\mu = b_0 + b_1\cdot \mathrm{time} + b_2\cdot x_1 ...\]

However, instead of just plopping an \(\epsilon\) at the end, we want to go further in defining the entire residual variance/covariance structure for all three time points.

In the simplest setting of a standard linear regression model, we have constant variance and no covariance.

\[\Sigma = \left[ \begin{array}{ccc} \sigma^2 & 0 & 0 \\ 0 & \sigma^2 & 0 \\ 0 & 0 & \sigma^2 \\ \end{array}\right]\]

Next, we can relax the assumption of equal variances, and estimate each separately. In this case of heterogeneous variances, we might see more or less variance over time, for example.

\[\Sigma = \left[ \begin{array}{ccc} \sigma_1^2 & 0 & 0 \\ 0 & \sigma_2^2 & 0 \\ 0 & 0 & \sigma_3^2 \\ \end{array}\right]\]

Now let’s say we actually want to get at the underlying covariance/correlation. I’ll switch to the correlation representation, but you can still think of the variances as constant or separately estimated. So now we have something like this, where \(\rho\) represents the residual correlation among observations.

\[\Sigma = \sigma^2 \left[ \begin{array}{ccc} 1 & \rho_1 & \rho_2 \\ \rho_1 & 1 & \rho_3 \\ \rho_2 & \rho_3 & 1 \\ \end{array}\right]\]

In this case we’d estimate a different correlation for all time point pairs (with constant variance). This is typically described as an unstructured, or simply ‘symmetric,’ correlation structure.

If you are familiar with repeated measures ANOVA, which is a special case of a mixed model, you may recall that the usual assumption is a sphericity, a relaxed form of compound symmetry, where all the correlations have the same value, i.e. \(\rho_1=\rho_2=\rho_3\), and all variances are equal.

Another very commonly used correlation structure (for time-based settings) is an autocorrelation structure, of lag order one, for the residuals. What this means is that we assume the residuals at one time point apart correlate with some value \(\rho\), observations at two time points apart correlate \(\rho^2\), and so on. As such we only need to estimate \(\rho\), while the rest are then automatically determined. Here’s what it’d look like for four time points.

\[\Sigma = \sigma^2 \left[ \begin{array}{cccc} 1 & \rho & \rho^2 & \rho^3 \\ \rho & 1 & \rho & \rho^2 \\ \rho^2 & \rho & 1 & \rho \\ \rho^3 & \rho^2 & \rho & 1 \\ \end{array}\right]\]

If \(\rho\) was estimated to be .5, it would look like the following.

\[\Sigma = \sigma^2 \left[ \begin{array}{cccc} 1 & .5 & .25 & .06 \\ .5 & 1 & .5 & .25 \\ .25 & .5 & 1 & .5 \\ .06 & .25 & .5 & 1 \\ \end{array}\right]\]

Again, the main point is that points further apart in time are assumed to have less correlation.

Know that there are many patterns and possibilities to potentially consider, and that they are not limited to the repeated measures scenario. For example, the correlation could represent spatial structure, where units closer together geographically would be more correlated. And as noted, we could also have variances that are different at each time point^{20}. We’ll start with that for the next example.

### Heterogeneous variance

Unfortunately, lme4 does not provide the ability to model the residual covariance structure, at least not in a straightforward fashion, though many other mixed model packages do^{21}. In fact, two packages that come with the basic R installation do so, mgcv and nlme. We’ll demonstrate with the latter.

The nlme package will have a different random effect specification, though not *too* different. In addition, to estimate heterogeneous variances, we’ll need to use an additional `weights`

argument. The following will allow each time point of occasion to have a unique estimate.

```
library(nlme)
= lme(
heterovar_res ~ occasion,
gpa data = gpa,
random = ~ 1 | student,
weights = varIdent(form = ~ 1 | occasion)
)
## summary(heterovar_res)
```

term | value | se | z | p_value | lower_2.5 | upper_97.5 |
---|---|---|---|---|---|---|

Intercept | 2.599 | 0.026 | 99.002 | 0 | 2.547 | 2.650 |

occasion | 0.106 | 0.004 | 26.317 | 0 | 0.098 | 0.114 |

group | effect | variance | sd | var_prop |
---|---|---|---|---|

student | Intercept | 0.094 | 0.306 | 0.404 |

Residual | 0.138 | 0.372 | 0.596 |

At this point we’re getting the same stuff we’re used to. Now the not-so fun part. For the values we’re interested in for this example, i.e. the variances at each occasion, nlme does not make it easy on someone to understand initially, as the output regards the way things are for estimation, not for what one would usually have to report. The variances are scaled relative to the first variance estimate, which is actually the reported residual variance in the random effects part. Additionally the values are also on the standard deviation rather than variance scale. From the default output display, we can see that variance decreases over time in this case, but the actual values are not provided.

`summary(heterovar_res$modelStruct)`

```
Random effects:
Formula: ~1 | student
(Intercept) Residual
StdDev: 0.8232544 1
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | occasion
Parameter estimates:
0 1 2 3 4 5
1.0000000 0.8261186 0.6272415 0.4311126 0.3484013 0.4324628
```

Relative values are fine, I guess, but what we’d want are the actual estimates. Here’s how you can get them using the residual standard deviation to scale those values, then square them to get on the variance scale.

`c(1.0000000, coef(heterovar_res$modelStruct$varStruct, unconstrained=F))*heterovar_res$sigma)^2 (`

```
1 2 3 4 5
0.13815037 0.09428374 0.05435276 0.02567636 0.01676917 0.02583744
```

Yeah. You’ll have to look this up every time you want to do it, or just make your own function that takes the model input. Here is how my function would extract the information.

`::extract_het_var(heterovar_res, scale = 'var') mixedup`

```
X0 X1 X2 X3 X4 X5
1 0.138 0.094 0.054 0.026 0.017 0.026
```

A newer alternative to keep in mind is glmmTMB. It would allow one to stay more in the lme4 style and output.

```
library(glmmTMB)
= glmmTMB(
heterovar_res2 ~ occasion + (1|student) + diag(0 + occas |student),
gpa data = gpa
)
summary(heterovar_res2)
```

```
Family: gaussian ( identity )
Formula: gpa ~ occasion + (1 | student) + diag(0 + occas | student)
Data: gpa
AIC BIC logLik deviance df.resid
261.1 312.0 -120.5 241.1 1190
Random effects:
Conditional model:
Groups Name Variance Std.Dev. Corr
student (Intercept) 0.093123 0.30516
student.1 occasyear 1 semester 1 0.129833 0.36032
occasyear 1 semester 2 0.086087 0.29341 0.00
occasyear 2 semester 1 0.046240 0.21503 0.00 0.00
occasyear 2 semester 2 0.017615 0.13272 0.00 0.00 0.00
occasyear 3 semester 1 0.008709 0.09332 0.00 0.00 0.00 0.00
occasyear 3 semester 2 0.017730 0.13316 0.00 0.00 0.00 0.00 0.00
Residual 0.008065 0.08980
Number of obs: 1200, groups: student, 200
Dispersion estimate for gaussian family (sigma^2): 0.00806
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.598788 0.026201 99.19 <2e-16 ***
occasion 0.106141 0.004034 26.31 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Note that the variances displayed for each time point are not conflated with the residual variance. To compare with nlme, just add the residual variance to those estimates. As with every mixed model package (apparently), it still takes a bit to get the variances in a usable form from the VarCorr object. The following shows how though, and compares to the nlme result.

```
= VarCorr(heterovar_res2)
vc_glmmtmb = attr(vc_glmmtmb$cond$student.1, 'stddev')^2 + sigma(heterovar_res2)^2 vc_glmmtmb
```

Here is the output from my function:

`::extract_het_var(heterovar_res2, scale = 'var') mixedup`

```
group occasyear.1.semester.1 occasyear.1.semester.2 occasyear.2.semester.1 occasyear.2.semester.2 occasyear.3.semester.1 occasyear.3.semester.2
1 student.1 0.138 0.094 0.054 0.026 0.017 0.026
```

In any case, putting these together shows we get the same result.

year 1 sem 1 | year 1 sem 2 | year 2 sem 1 | year 2 sem 2 | year 3 sem 1 | year 3 sem 2 | |
---|---|---|---|---|---|---|

glmmTMB | 0.138 | 0.094 | 0.054 | 0.026 | 0.017 | 0.026 |

nlme | 0.138 | 0.094 | 0.054 | 0.026 | 0.017 | 0.026 |

### Autocorrelation

The following example shows the same basic model, but with the autocorrelation structure we described previously. In nlme we use the built-in corAR1 function and `correlation`

argument similar to how we did with the weights argument.

```
library(nlme)
= lme(
corr_res ~ occasion,
gpa data = gpa,
random = ~ 1 | student,
correlation = corAR1(form = ~ occasion)
)
## summary(corr_res)
```

term | value | se | z | p_value | lower_2.5 | upper_97.5 |
---|---|---|---|---|---|---|

Intercept | 2.597 | 0.023 | 113.146 | 0 | 2.552 | 2.642 |

occasion | 0.107 | 0.005 | 20.297 | 0 | 0.097 | 0.118 |

group | effect | variance | sd | var_prop |
---|---|---|---|---|

student | Intercept | 0.046 | 0.215 | 0.381 |

Residual | 0.075 | 0.273 | 0.619 |

Notice first that the fixed effect for occasion is the same as before. The variance estimates have changed slightly along with the variances of the fixed effects (i.e. the standard errors). The main thing is that we have a new parameter called `Phi`

in the nlme output that represents our autocorrelation, `Phi`

, with value of 0.418. This suggests at least some correlation exists among the residuals for observations next to each other in time, though it diminishes quickly as observations grow further apart.

Note that glmmTMB will analyze such structure as well. Note that we need the factor form for occasion for this specification, and also note how it is part of model formula, like another random effect. More on this in the supplemental section.

```
= glmmTMB(
corr_res_tmb ~ occasion + ar1(0 + occas | student) + (1 | student),
gpa data = gpa
)
```

## Generalized Linear Mixed Models

Just as generalized linear models extend the standard linear model, we can generalize (linear) mixed models to generalized linear mixed models. Furthermore, there is nothing restricting us to only the exponential family, as other packages would potentially allow for many other response distributions.

For this example we’ll do a logistic regression in the mixed model setting. In this case, we’ll use the speed dating data set. In the speed dating events, the experiment randomly assigned each participant to ten short dates (four minutes) with other participants. For each date, each person rated six attributes (attractive, sincere, intelligent, fun, ambitious, shared interests) of the other person on a 10-point scale and wrote down whether he or she would like to see the other person again.

Our target variable is whether the participant would be willing to date the person again (`decision`

). To keep things simple, the predictors will be limited to the sex of the participant (`sex`

), whether the partner was of the same race (`samerace`

), and three of the attribute ratings the participant gave of their partner- attractiveness (`attractive`

), sincerity (`sincere`

), and intelligence (`intelligent`

). The latter have been scaled to have zero mean and standard deviation of .5, which puts them on a more even footing with the binary covariates (`_sc`

)^{22}.

```
load('data/speed_dating.RData')
= glmer(
sd_model ~ sex + samerace + attractive_sc + sincere_sc + intelligent_sc
decision + (1 | iid),
data = speed_dating,
family = binomial
)
summary(sd_model, correlation = FALSE)
```

term | value | se | z | p_value | lower_2.5 | upper_97.5 |
---|---|---|---|---|---|---|

Intercept | -0.743 | 0.121 | -6.130 | 0.00 | -0.981 | -0.506 |

sexMale | 0.156 | 0.164 | 0.954 | 0.34 | -0.165 | 0.478 |

sameraceYes | 0.314 | 0.075 | 4.192 | 0.00 | 0.167 | 0.460 |

attractive_sc | 1.957 | 0.058 | 33.560 | 0.00 | 1.842 | 2.071 |

sincere_sc | 0.311 | 0.054 | 5.747 | 0.00 | 0.205 | 0.417 |

intelligent_sc | 0.444 | 0.054 | 8.232 | 0.00 | 0.338 | 0.550 |

The fixed effects results are as expected for the attributes, with attractiveness being a very strong effect in particular. In addition, having a partner of the same race had a positive effect, while sex of the participant was statistically negligible. You are free to exponentiate the coefficients to get the odds ratios if desired, just as you would with standard logistic regression.

group | effect | variance | sd |
---|---|---|---|

iid | Intercept | 2.708 | 1.645 |

For the variance components, notice that there is no residual variance. This is because we are not modeling with the normal distribution for the response, thus there is no \(\sigma\) to estimate. However, the result suggests that there is quite a bit of variability from person to person.

## Exercises for Extensions

### Sociometric data

In the following data, kids are put into different groups and rate each other in terms of how much they would like to share some activity with the others. We have identifying variables for the person doing the rating (sender), the person being rated (receiver), what group they are in, as well as age and sex for both sender and receiver, as well as group size.

To run a mixed model, we will have three sources of structure to consider:

- senders (within group)
- receivers (within group)
- group

First, load the sociometric data.

`load('data/sociometric.RData')`

To run the model, we will proceed with the following modeling steps. For each, make sure you are creating a separate model object for each model run.

- Model 1: No covariates, only sender and receiver random effects. Note that even though we don’t add group yet, still use the nesting approach to specify the effects (e.g.
`1|group:receiver`

) - Model 2: No covariates, add group random effect
- Model 3: Add all covariates:
`agesend/rec`

,`sexsend/rec`

, and`grsize`

(group size) - Model 4: In order to examine sex matching effects, add an interaction of the sex variables to the model
`sexsend:sexrec`

. - Compare models with AIC (see the note about model comparison), e.g.
`AIC(model1)`

. A lower value would indicate the model is preferred.

### Patents

Do a Poisson mixed effect model using the patent data. Model the number of citations (`ncit`

) based on whether there was opposition (`opposition`

) and if it was for the biotechnology/pharmaceutical industry (`biopharm`

). Use year as a random effect to account for unspecified economic conditions.

`load('data/patents.RData')`

Interestingly, one can model overdispersion in a Poisson model by specifying an random intercept for each observation (`subject`

in the data). In other words, no specific clustering or grouped structure is necessary, but we can use the random effect approach to get at the extra variance.

I don’t show the formal model here as we did before, but this is why depicting mixed models solely as ‘multilevel’ becomes a bit problematic in my opinion. In the standard mixed model notation it’s straightforward though, you just add an additional random effect term, just as we do in the actual model syntax.↩︎

Setting aside our discussion to take a turn regarding regression modeling more generally, this is a good example of ‘surprising’ effects not being so surprising when you consider them more closely. Take a look at the effect of experience. More experience means less stress, this is probably not surprising. Now look at the age effect. It’s positive! But wouldn’t older nurses have more experience? What’s going on here? When interpreting experience, it is with age

*held constant*, thus more experience helps with lowering stress no matter what your age. With age, we’re holding experience constant. If experience doesn’t matter, being older is affiliated with more stress, which might be expected given the type of very busy and high pressure work often being done (the mean age is 43). A good way to better understand this specifically is to look at predicted values when age is young, middle, and older vs. experience levels at low, middle, and high experience, possibly explicitly including the interaction of the two in the model. Also note that if you take experience out of the model, the age effect is negative, which is expected, as it captures experience also.↩︎Just a reminder, it

*does*matter if you label your data in a less than optimal fashion. For example, if in the nesting situation you start your id variable at 1 for each nested group, then you have to use the nested notation in lme4, otherwise, e.g. it won’t know that id = 1 in group 1 is different from id 1 in group 2. In our hospital example, this would be akin to using`ward`

instead of`wardid`

as we did. Again though, this wouldn’t be an issue if one practices good data habits. Note also the`:`

syntax. In other modeling contexts in R this denotes an interaction, and that is no different here. In some contexts, typically due to experimental designs, one would want to explore random effects of the sort 1|A, 1|B and 1|A:B. However, this is relatively rare.↩︎One reason to do so would be that you expect variability to decrease over time, e.g. due to experience. You might also allow that variance to be different due to some other grouping factor entirely (e.g. due to treatment group membership).↩︎

This feature request has been made by its users for over a decade at this point- it’s not gonna happen. The issue is that the way lmer works by default employs a method that won’t allow it (this is why it is faster and better performing than other packages). Unfortunately the common response to this issue is ‘use nlme.’ However many other packages work with lme4 rather than nlme, and if you aren’t going to use lme4 for mixed models you might as well go Bayesian with rstanarm or brms instead of nlme. I would even prefer mgcv to nlme (though it can use nlme under the hood) because of the other capabilities it provides, and the objects created are easier to work with in my opinion.↩︎

Note that for a balanced binary variable, the mean

`p=.5`

and standard deviation is`sqrt(p*(1-p)) = .5`

↩︎