Common Extensions

Additional Grouping Structure

Cross-classified models

Oftentimes there will be additional sources of variance beyond one grouping factor. Consider as an example, a visual perception experiment where there are multiple trials for each individual, to go along with specific images displayed to all individuals. Such data might look like this.

In this scenario, we have observations clustered within both person and image, but person and image are not nested within one another- all participants see all 10 images. Such a situation is typically referred to as having crossed random effects, which pretty much just means non-nested, or even more to the point, ‘default.’ The larger issue for the situations we’ll look at next is that we will have multiple sources variances to consider.

Example: Student achievement

For this demonstration we’ll look at achievement scores for students. The sources of dependency are due to students having gone to the same primary or secondary schools. However, in this example, going to a primary school doesn’t necessarily mean you’ll go to a specific secondary school. Note also that there are no repeated measures, we see each student only once. Here’s a quick look a the data, and for more detail, check the appendix.


For our mixed model, we’ll look at the effects of sex and socioeconomic status, ses, a six level variable from low to high, on scholastic achievement. The range of achievement scores is roughly 4 to 10, with mean of 6.3 and standard deviation 0.9. We’ll take into account the clustering at primary school and secondary school. To incorporate the additional structure in lme4 syntax is very easy- we just do as we did before, though now for both grouping factors19.

pupils_crossed = lmer(
  achievement ~ 
  + ses 
  + (1|primary_school_id) 
  + (1|secondary_school_id),
  data = pupils
## summary(pupils_crossed, correlation = FALSE)
term value se lower_2.5 upper_97.5
Intercept 5.924 0.123 5.684 6.164
sexfemale 0.261 0.046 0.171 0.350
ses2 0.132 0.118 -0.098 0.362
ses3 0.098 0.110 -0.118 0.314
ses4 0.298 0.105 0.093 0.503
ses5 0.354 0.101 0.156 0.551
seshighest 0.616 0.110 0.401 0.832

The fixed effects tell us there is a positive effect of being female on achievement, and in general, relative to lowest SES category, being in the upper categories of SES also has a positive effect. Now for the random effects.

group effect variance sd var_prop
primary_school_id Intercept 0.17 0.42 0.24
secondary_school_id Intercept 0.07 0.26 0.09
Residual 0.47 0.69 0.66

When we look at the variance components, we see that primary and secondary school contributes about 34% of the total variance. Most of the variance attributable to school comes from the primary school.

If we inspect the random effects, we can see that we now have two sets of effects- 50 for the primary schools, and 30 for the secondary. Both would be incorporated into any pupil-specific prediction.

List of 2
 $ primary_school_id  :'data.frame':    50 obs. of  1 variable:
  ..$ (Intercept): num [1:50] -0.327 0.183 0.52 0.474 0.253 ...
  ..- attr(*, "postVar")= num [1, 1, 1:50] 0.0247 0.0225 0.0244 0.0215 0.0285 ...
 $ secondary_school_id:'data.frame':    30 obs. of  1 variable:
  ..$ (Intercept): num [1:30] -0.41069 0.08247 -0.00589 -0.06162 0.08481 ...
  ..- attr(*, "postVar")= num [1, 1, 1:30] 0.0155 0.014 0.0159 0.0131 0.0142 ...
 - attr(*, "class")= chr "ranef.mer"

Let’s look at them visually using merTools.

The above visualization allows us to actually see less variability among students for secondary schools. Note that we have the usual extensions here if desired. As an example, we could also do random slopes for student characteristics.

Hierarchical structure

Now that we have looked at cross-classified models, we can proceed to examine hierarchical cluster structuring. In this situation, we have clusters nested within other clusters, which may be nested within still other clusters. A typical example might be cities within counties, and counties within states.

Example: Nurses and stress

For our demonstration we’ll use the nurses data set. Here we are interested in the effect of a training program (treatment) on stress levels (on a scale of 1-7) of nurses. In this scenario, nurses are nested within wards, which themselves are nested within hospitals, so we will have random effects pertaining to ward (within hospital) and hospital. For more information see the appendix.


For the model, we examine effects of the treatment, as well as several other covariates, at least one each at the nurse, ward, and hospital levels. Again, when it comes to the fixed effects portion, you can simply think about that part as you would any standard regression, we just add covariates as theory/exploration would suggest. To incorporate this type of random effects structure is not too different from the cross-classified approach, but does have a slight change to the syntax.

nurses_hierarchical = lmer(
  stress ~ 
  + sex 
  + experience 
  + treatment 
  + wardtype 
  + hospsize
  + (1 | hospital) 
  + (1 | hospital:ward),
  data = nurses

## # same thing!
## nurses_hierarchical = lmer(
##   stress ~ 
##     age 
##   + sex 
##   + experience 
##   + treatment 
##   + wardtype 
##   + hospsize 
##   + (1 | hospital/ward), 
##   data = nurses
## )
## summary(nurses_hierarchical, correlation = F)
term value se lower_2.5 upper_97.5
Intercept 5.380 0.185 5.018 5.742
age 0.022 0.002 0.018 0.026
sexFemale -0.453 0.035 -0.522 -0.385
experience -0.062 0.004 -0.070 -0.053
treatmentTraining -0.700 0.120 -0.935 -0.465
wardtypespecial care 0.051 0.120 -0.184 0.286
hospsizemedium 0.489 0.202 0.094 0.884
hospsizelarge 0.902 0.275 0.363 1.440

As far as the fixed effects go, about the only thing that doesn’t have a statistical effect is ward type20.

group effect variance sd var_prop
hospital:ward Intercept 0.337 0.580 0.500
hospital Intercept 0.119 0.345 0.177
Residual 0.217 0.466 0.323

Concerning the random effects, there appears to be quite a bit of variability from ward to ward especially, but also hospital. Recall that stress is a 7 point scale, so from ward to ward we can expect scores to bounce around about half a point on average, which is quite dramatic in my opinion. Again we inspect it visually.

Crossed vs. nested

The following shows the difference in the results from treating ward as a nested (within hospital) vs. crossed random effect. What do you notice is different in the result of these two models?

nurses_hierarchical = lmer(
  stress ~ 
  + sex 
  + experience 
  + treatment 
  + wardtype 
  + hospsize 
  + (1 | hospital)
  + (1 | hospital:wardid),  #**#
  data = nurses

nurses_crossed = lmer(
  stress ~ 
  + sex 
  + experience 
  + treatment 
  + wardtype 
  + hospsize 
  + (1 | hospital) 
  + (1 | wardid),           #**#
  data = nurses
group effect variance sd var_prop
hospital:ward Intercept 0.337 0.580 0.500
hospital Intercept 0.119 0.345 0.177
Residual 0.217 0.466 0.323
group effect variance sd var_prop
wardid Intercept 0.337 0.580 0.500
hospital Intercept 0.119 0.345 0.177
Residual 0.217 0.466 0.323

Nothing? Good, you’re not crazy. Here’s a quote from the lme4 text, section, which is definitely worth your time.

The blurring of mixed-effects models with the concept of multiple, hierarchical levels of variation results in an unwarranted emphasis on ‘levels’ when defining a model and leads to considerable confusion. It is perfectly legitimate to define models having random effects associated with non-nested factors. The reasons for the emphasis on defining random effects with respect to nested factors only are that such cases do occur frequently in practice, and that some of the computational methods for estimating the parameters in the models can only be easily applied to nested factors.

This is not the case for the methods used in the lme4 package. Indeed there is nothing special done for models with random effects for nested factors. When random effects are associated with multiple factors, exactly the same computational methods are used whether the factors form a nested sequence or are partially crossed or are completely crossed.

You might have noticed that we were using wardid rather than the ward grouping variable as in our first example. Even though every ward is unique, the ward column labels them with an arbitrary sequence starting with 1 within each hospital. While this might seem natural, ward 1 in hospital 1 is not the same as ward 1 in hospital 2, so it’s probably not a good idea to give them the same label. The wardid column properly distinguishes the wards with unique values (e.g. 11, 12).

What would have happened had we used that variable as a crossed random effect?

nurses_crossed_bad_data = lmer(
  stress ~ 
  + sex 
  + experience 
  + treatment 
  + wardtype 
  + hospsize 
  + (1|hospital) 
  + (1|ward), 
  data = nurses
group effect variance sd var_prop
hospital Intercept 0.196 0.442 0.297
ward Intercept 0.000 0.000 0.000
Residual 0.463 0.681 0.703

This is certainly not the result we want. The variance in ward is already captured by treatment and type. However, as demonstrated, this can be avoided with the proper syntax, or proper labeling in the data to allow unique clusters to have unique identifiers.

For more, see this discussion on Stack Exchange, as well as this from the GLMM FAQ from one of the lme4 developers. Josh Errickson at the University of Michigan’s CSCAR group also has a nice write-up with visual depiction of the underlying matrices of interest, which served as inspiration for some of the visualization in the next section.

So there you have it. When it comes to lme4, or mixed models more generally, crossed vs. nested is simply a state of mind (data)21.

Residual Structure

Sometimes we will want to obtain more specific estimates regarding the residual covariance/correlation structure. This is especially the case in the longitudinal setting, where we think that observations closer in time would be more strongly correlated than those further apart, or that the variance changes over time. What does this sort of model look like?

Let’s begin by thinking about the covariance/correlation matrix for the entire set of observations for our target variable, and how we want to represent the dependency in those observations. I’ll show a visualization of the first 5 people from our GPA data and modeling situation. Recall that each person has 6 observations. This display regards the results from our random intercepts (only) model for GPA.

Each block represents the covariance matrix pertaining to observations within an individual. Within the person, there are variances on the diagonal and covariances of observations on the off-diagonal. When considering the whole data, we can see that observations from one person have no covariance with another person (gray). Furthermore, the covariance within a person is a constant value, the variance is also a constant value. Where did those values come from? Let’s look again at the result from our random intercept model.

group effect variance sd var_prop
student Intercept 0.064 0.252 0.523
Residual 0.058 0.241 0.477

Remember that there are two sources of variance in this model, the residual observation level variance, and that pertaining to person. Combined they provide the total residual variance that we aren’t already capturing with our model features (i.e. fixed effect variables). In this case, it’s about 0.12, the value displayed on our diagonal. The off-diagonal is the variance attributable to student, which we alternately interpreted as an intraclass correlation (dividing by the total variance converts it to the correlation metric).

More generically, and referring to previous notation for our estimated variances, we can see the covariance matrix (for a cluster) as follows.

\[\Sigma = \left[ \begin{array}{ccc} \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2 & \tau^2 & \tau^2 \\ \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2 & \tau^2 \\ \tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2 & \tau^2 \\ \tau^2 & \tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 & \tau^2\\ \tau^2 & \tau^2 & \tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} & \tau^2 \\ \tau^2 & \tau^2 & \tau^2 & \tau^2 & \tau^2 & \color{orange}{\sigma^2 + \tau^2} \\ \end{array}\right]\]

This represents a covariance structure called compound symmetry, where correlations, the off-diagonal values, are constant. It is the default in most mixed model settings, and the same as what is shown visually above. Now let’s start to think about other types of covariance structures.

Consider the following model for an individual and just three time points to keep things simpler to show.

\[\boldsymbol{y} \sim \mathcal{N}(\boldsymbol{\mu}, \boldsymbol{\Sigma})\]

So we have three observations of \(y\) that are multivariate normally distributed. The mean \(\mu\) is a function of covariates just like in standard regression.

\[\mu = b_0 + b_1\cdot \mathrm{time} + b_2\cdot x_1 ...\]

However, instead of just plopping an \(\epsilon\) at the end, we want to go further in defining the entire residual variance/covariance structure for all three time points.

In the simplest setting of a standard linear regression model, we have constant variance and no covariance.

\[\Sigma = \left[ \begin{array}{ccc} \sigma^2 & 0 & 0 \\ 0 & \sigma^2 & 0 \\ 0 & 0 & \sigma^2 \\ \end{array}\right]\]

Next, we can relax the assumption of equal variances, and estimate each separately. In this case of heterogeneous variances, we might see more or less variance over time, for example.

\[\Sigma = \left[ \begin{array}{ccc} \sigma_1^2 & 0 & 0 \\ 0 & \sigma_2^2 & 0 \\ 0 & 0 & \sigma_3^2 \\ \end{array}\right]\]

Now let’s say we actually want to get at the underlying covariance/correlation. I’ll switch to the correlation representation, but you can still think of the variances as constant or separately estimated. So now we have something like this, where \(\rho\) represents the residual correlation among observations.

\[\Sigma = \sigma^2 \left[ \begin{array}{ccc} 1 & \rho_1 & \rho_2 \\ \rho_1 & 1 & \rho_3 \\ \rho_2 & \rho_3 & 1 \\ \end{array}\right]\]

In this case we’d estimate a different correlation for all time point pairs (with constant variance). This is typically described as an unstructured, or simply ‘symmetric,’ correlation structure.

If you are familiar with repeated measures ANOVA, which is a special case of a mixed model, you may recall that the usual assumption is a sphericity. Sphericity is a relaxed form of compound symmetry, where all the correlations have the same value, i.e. \(\rho_1=\rho_2=\rho_3\), and all variances are equal.

Another very commonly used correlation structure (for time-based settings) is an autocorrelation structure, of lag order one, for the residuals. What this means is that we assume the residuals at one time point apart correlate with some value \(\rho\), observations at two time points apart correlate \(\rho^2\), and so on. As such we only need to estimate \(\rho\), while the rest are then automatically determined. Here’s what it’d look like for four time points.

\[\Sigma = \sigma^2 \left[ \begin{array}{cccc} 1 & \rho & \rho^2 & \rho^3 \\ \rho & 1 & \rho & \rho^2 \\ \rho^2 & \rho & 1 & \rho \\ \rho^3 & \rho^2 & \rho & 1 \\ \end{array}\right]\]

If our model estimated \(\rho\) to be .5, the resulting correlation matrix would look like the following within each cluster.

\[\Sigma = \sigma^2 \left[ \begin{array}{cccc} 1 & .5 & .25 & .06 \\ .5 & 1 & .5 & .25 \\ .25 & .5 & 1 & .5 \\ .06 & .25 & .5 & 1 \\ \end{array}\right]\]

Again, the main point is that points further apart in time are assumed to have less correlation.

Know that there are many patterns and possibilities to potentially consider, and that they are not limited to the repeated measures scenario. For example, the correlation could represent spatial structure, where units closer together geographically would be more correlated. And as noted, we could also have variances that are different at each time point22. We’ll start with that for the next example.

Heterogeneous variance

Unfortunately, lme4 does not provide the ability to model the residual covariance structure, at least not in a straightforward fashion, though many other mixed model packages do23. In fact, two packages that come with the basic R installation do so, mgcv and nlme. We’ll demonstrate with the latter.

The nlme package will have a different random effect specification, though not too different. In addition, to estimate heterogeneous variances, we’ll need to use an additional weights argument. The following will allow each time point of occasion to have a unique estimate.


heterovar_res = lme(
  gpa ~ occasion,
  data = gpa,
  random = ~ 1 | student,
  weights = varIdent(form = ~ 1 | occasion)

## summary(heterovar_res)
term value se z p_value lower_2.5 upper_97.5
Intercept 2.599 0.026 99.002 0 2.547 2.650
occasion 0.106 0.004 26.317 0 0.098 0.114
group effect variance sd var_prop
student Intercept 0.094 0.306 0.404
Residual 0.138 0.372 0.596

At this point we’re getting the same stuff we’re used to. Now the not-so fun part. For the values we’re interested in for this example, i.e. the variances at each occasion, nlme does not make it easy on someone to understand initially, as the output regards the way things are for estimation, not for what one would usually have to report. The variances are scaled relative to the first variance estimate, which is actually the reported residual variance in the random effects part. Additionally the values are also on the standard deviation rather than variance scale. From the default output display, we can see that variance decreases over time in this case, but the actual values are not provided.

Random effects:
 Formula: ~1 | student
        (Intercept) Residual
StdDev:   0.8232544        1

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | occasion 
 Parameter estimates:
        0         1         2         3         4         5 
1.0000000 0.8261186 0.6272415 0.4311126 0.3484013 0.4324628 

Relative values are fine, I guess, but what we’d want are the actual estimates. Here’s how you can get them using the residual standard deviation to scale those values, then square them to get on the variance scale.

(c(1.0000000, coef(heterovar_res$modelStruct$varStruct, unconstrained = FALSE))*heterovar_res$sigma)^2
                    1          2          3          4          5 
0.13815037 0.09428374 0.05435276 0.02567636 0.01676917 0.02583744 

Yeah. You’ll have to look this up every time you want to do it, or just make your own function that takes the model input. Here is how my function would extract the information.

mixedup::extract_het_var(heterovar_res, scale = 'var')
     X0    X1    X2    X3    X4    X5
1 0.138 0.094 0.054 0.026 0.017 0.026

Another alternative to keep in mind is glmmTMB, which is provided by the primary developer of lme4. It would allow one to stay more in the lme4 syntax style for similar models, and provide similar looking output. In this case we have a special syntax for the case of heterogeneous variances, specifically using tthe diag function within the model formula.


heterovar_res2 = glmmTMB(
  gpa ~ occasion + (1|student) + diag(0 + occas |student), 
  data = gpa

 Family: gaussian  ( identity )
Formula:          gpa ~ occasion + (1 | student) + diag(0 + occas | student)
Data: gpa

     AIC      BIC   logLik deviance df.resid 
   261.1    312.0   -120.5    241.1     1190 

Random effects:

Conditional model:
 Groups    Name                   Variance Std.Dev. Corr                     
 student   (Intercept)            0.093123 0.30516                           
 student.1 occasyear 1 semester 1 0.129833 0.36032                           
           occasyear 1 semester 2 0.086087 0.29341  0.00                     
           occasyear 2 semester 1 0.046240 0.21503  0.00 0.00                
           occasyear 2 semester 2 0.017615 0.13272  0.00 0.00 0.00           
           occasyear 3 semester 1 0.008709 0.09332  0.00 0.00 0.00 0.00      
           occasyear 3 semester 2 0.017730 0.13316  0.00 0.00 0.00 0.00 0.00 
 Residual                         0.008065 0.08980                           
Number of obs: 1200, groups:  student, 200

Dispersion estimate for gaussian family (sigma^2): 0.00806 

Conditional model:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) 2.598788   0.026201   99.19   <2e-16 ***
occasion    0.106141   0.004034   26.31   <2e-16 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note that the variances displayed for each time point are not conflated with the residual variance. To compare with nlme, just add the residual variance to those estimates. As with every mixed model package (apparently), it still takes a bit to get the variances in a usable form from the VarCorr object. The following shows how though, and compares to the nlme result.

vc_glmmtmb = VarCorr(heterovar_res2)
vc_glmmtmb = attr(vc_glmmtmb$cond$student.1, 'stddev')^2 + sigma(heterovar_res2)^2

Here is the output from my function:

mixedup::extract_het_var(heterovar_res2, scale = 'var')
      group occasyear.1.semester.1 occasyear.1.semester.2 occasyear.2.semester.1 occasyear.2.semester.2 occasyear.3.semester.1 occasyear.3.semester.2
1 student.1                  0.138                  0.094                  0.054                  0.026                  0.017                  0.026

In any case, putting these together shows we get the same result.

year 1 sem 1 year 1 sem 2 year 2 sem 1 year 2 sem 2 year 3 sem 1 year 3 sem 2
glmmTMB 0.138 0.094 0.054 0.026 0.017 0.026
nlme 0.138 0.094 0.054 0.026 0.017 0.026


The following example demonstrates the autocorrelation structure we described previously. In nlme we use the built-in corAR1 function and correlation argument similar to how we did with the weights argument for heterogeneous variances.


corr_res = lme(
  gpa ~ occasion,
  data = gpa,
  random = ~ 1 | student,
  correlation = corAR1(form = ~ occasion)

## summary(corr_res)
term value se z p_value lower_2.5 upper_97.5
Intercept 2.597 0.023 113.146 0 2.552 2.642
occasion 0.107 0.005 20.297 0 0.097 0.118
group effect variance sd var_prop
student Intercept 0.046 0.215 0.381
Residual 0.075 0.273 0.619

Notice first that the fixed effect for occasion is the same as before. The variance estimates have changed slightly along with the variances of the fixed effects (i.e. the standard errors). The main thing is that we have a new parameter called Phi in the nlme output that represents our autocorrelation, with value of 0.418. This suggests at least some correlation exists among the residuals for observations next to each other in time, though it diminishes quickly as observations grow further apart.

Note that glmmTMB will analyze such structure as well. However, we need the factor form for occasion for this specification occas, and also note how it is part of model formula, like another random effect. More on this in the supplemental section.

corr_res_tmb = glmmTMB(
  gpa ~ occasion +  ar1(0 + occas | student) + (1 | student),
  data = gpa

We can use my package function to extract the autocorrelation parameter from these types of models.

# A tibble: 1 × 1
1 0.418

Generalized Linear Mixed Models

Just as generalized linear models extend the standard linear model, we can generalize (linear) mixed models to generalized linear mixed models. Furthermore, there is nothing restricting us to only the exponential family, as other packages would potentially allow for many other target distributions.

For this example, we’ll do a logistic mixed model with a binary outcome. In this case, we’ll use the speed dating data set. In this demonstration, the experimental setting had randomly assigned each participant to ten short (four minutes) dates with other participants. For each date, each person rated six attributes (attractive, sincere, intelligent, fun, ambitious, shared interests) of the other person on a 10-point scale and wrote down whether he or she would like to see the other person again.

Our target variable is whether the participant would be willing to date the person again (decision, recorded as Yes or No). To keep things simple, the predictors will be limited to the sex of the participant (sex), whether the partner was of the same race (samerace), and three of the attribute ratings the participant gave of their partner- attractiveness (attractive), sincerity (sincere), and intelligence (intelligent). The latter have been scaled to have zero mean and standard deviation of .5, which puts them on a more even footing with the binary covariates (_sc)24.

To fit the model, we’ll use the glmer function and specify the family as binomial, just as we would for the usual logistic setting using glm in base R.


sd_model = glmer(
  decision ~ 
  + samerace 
  + attractive_sc 
  + sincere_sc 
  + intelligent_sc
  + (1 | iid),
  data   = speed_dating,
  family = binomial

summary(sd_model, correlation = FALSE)
term value se z p_value lower_2.5 upper_97.5
Intercept -0.743 0.121 -6.130 0.00 -0.981 -0.506
sexMale 0.156 0.164 0.954 0.34 -0.165 0.478
sameraceYes 0.314 0.075 4.192 0.00 0.167 0.460
attractive_sc 1.957 0.058 33.560 0.00 1.842 2.071
sincere_sc 0.311 0.054 5.747 0.00 0.205 0.417
intelligent_sc 0.444 0.054 8.232 0.00 0.338 0.550

The fixed effects results are as expected for the attributes, with attractiveness being a very strong effect in particular, while sex of the participant was statistically negligible. You are free to exponentiate the coefficients to get the odds ratios if desired, just as you would with standard logistic regression. Now for the random effects result.

group effect variance sd
iid Intercept 2.708 1.645

For the variance components, notice that there is no residual variance. This is because we are not modeling with the normal distribution for the target variable, thus there is no \(\sigma\) to estimate. However, the result suggests that there is quite a bit of variability from person to person on the logit scale.

Exercises for Extensions

Sociometric data

In the following data, kids are put into different groups and rate each other in terms of how much they would like to share some activity with the others. We have identifying variables for the person doing the rating (sender), the person being rated (receiver), what group they are in, as well as age and sex for both sender and receiver, as well as group size.

To run a mixed model, we will have three sources of structure to consider:

  • senders (within group)
  • receivers (within group)
  • group

First, load the sociometric data.


To run the model, we will proceed with the following modeling steps. For each, make sure you are creating a separate model object for each model run.

  • Model 1: No covariates, only sender and receiver random effects. Note that even though we don’t add group yet, still use the nesting approach to specify the effects (e.g. 1|group:receiver)
  • Model 2: No covariates, add group random effect
  • Model 3: Add all covariates: agesend/rec, sexsend/rec, and grsize (group size)
  • Model 4: In order to examine sex matching effects, add an interaction of the sex variables to the model sexsend:sexrec.
  • Compare models with AIC (see the note about model comparison), e.g. AIC(model_1, model_2, ...). A lower value would indicate the model is preferred.


Do a Poisson mixed effect model using the patent data. Model the number of citations (ncit) based on whether there was opposition (opposition) and if it was for the biotechnology/pharmaceutical industry (biopharm). Use year as a random effect to account for unspecified economic conditions.


Interestingly, one can model overdispersion in a Poisson model by specifying an random intercept for each observation (subject in the data). In other words, no specific clustering or grouped structure is necessary, but we can use the random effect approach to get at the extra variance beyond what would be assumed with a Poisson distribution.

  1. I don’t show the formal model here as we did before, but this is why depicting mixed models solely as ‘multilevel’ becomes a bit problematic in my opinion. In the standard mixed model notation it’s straightforward though, you just add an additional random effect term, just as we do in the actual model syntax.↩︎

  2. Setting aside our discussion to take a turn regarding regression modeling more generally, this is a good example of ‘surprising’ effects not being so surprising when you consider them more closely. Take a look at the effect of experience. More experience means less stress, this is probably not surprising. Now look at the age effect. It’s positive! But wouldn’t older nurses have more experience? What’s going on here? When interpreting experience, it is with age held constant, thus more experience helps with lowering stress no matter what your age. With age, we’re holding experience constant. If experience doesn’t matter, being older is affiliated with more stress, which might be expected given the type of very busy and high pressure work often being done (the mean age is 43). A good way to better understand this specifically is to look at predicted values when age is young, middle, and older vs. experience levels at low, middle, and high experience, possibly explicitly including the interaction of the two in the model. Also note that if you take experience out of the model, the age effect is negative, which is expected, as it captures experience also.↩︎

  3. Just a reminder, it does matter if you label your data in a less than optimal fashion. For example, if in the nesting situation you start your id variable at 1 for each nested group, then you have to use the nested notation in lme4, otherwise, e.g. it won’t know that id = 1 in group 1 is different from id 1 in group 2. In our hospital example, this would be akin to using ward instead of wardid as we did. Again though, this wouldn’t be an issue if one practices good data habits. Note also the : syntax. In other modeling contexts in R this denotes an interaction, and that is no different here. In some contexts, typically due to experimental designs, one would want to explore random effects of the sort 1|A, 1|B and 1|A:B. However, this is relatively rare.↩︎

  4. One reason to do so would be that you expect variability to decrease over time, e.g. due to experience. You might also allow that variance to be different due to some other grouping factor entirely (e.g. due to treatment group membership).↩︎

  5. This feature request has been made by its users for over a decade at this point- it’s not gonna happen. The issue is that the way lmer works by default employs a method that won’t allow it (this is why it is faster and better performing than other packages). Unfortunately the common response to this issue is ‘use nlme.’ However many other packages work with lme4 model results, but not nlme, and if you aren’t going to use lme4 for mixed models you might as well go Bayesian with rstanarm or brms instead of nlme. I would even prefer mgcv to nlme because of the other capabilities it provides, and the model objects created are easier to work with in my opinion.↩︎

  6. Note that for a balanced binary variable, the mean p=.5 and standard deviation is sqrt(p*(1-p)) = .5↩︎