More Random Effects

Previously we’ve looked at random intercepts, but any observation or lower level covariate effect could be allowed to vary by cluster as well.


Returning to the GPA data, recall the visualization from before. I’ve animated it to highlight the differences among students.

Let us now assume that the trend over time is allowed to vary by student in addition to the intercepts. Using lme4, this is quite straightforward.

gpa_mixed =  lmer(gpa ~ occasion + (1 + occasion | student), data = gpa)

Pretty easy huh? Within the parenthesis, to the left of that bar | we are just positing a model formula as we would do with most modeling functions12. Let’s look at the results.

term value se lower_2.5 upper_97.5
Intercept 2.599 0.018 2.563 2.635
occasion 0.106 0.006 0.095 0.118
group effect variance sd var_prop
student Intercept 0.045 0.213 0.491
student occasion 0.005 0.067 0.049
Residual 0.042 0.206 0.460

As before, since we have 0 as our starting semester, the intercept tells us what the average GPA is in the first semester. The coefficient for occasion still reflects a one semester change in GPA. As we have not changed the fixed effect portion of the model, the values are the same as before.

The associated intercept variance tells us how much that starting GPA bounces around from student to student. The variance for the occasion effect might not look like much in comparison, but slopes are on a notably different scale than the intercept. Note that the mean slope for the semester to semester effect, our fixed effect, is 0.11, but from student to student it bounces around half that. Thus we could expect most students to fall somewhere between a flat effect of zero to more than double the population average13.

Yet another point of interest is the correlation of the intercepts and slopes. In this case it’s -0.1. That’s pretty small, but the interpretation is the same as with any correlation. In this case specifically, it tells us that those with lower intercepts would be associated with increased time trajectories. This makes intuitive sense in that people are improving in general, and those at the bottom would have more room to improve. However, this is very slight, and practically speaking we might not put too much weight on it.

Comparison to many regressions

Let’s compare these results to the ones we would have gotten had we run a separate regression for each student. In what follows we see the distribution of of the estimated intercept and slope coefficients for all the students. Note that these are the same estimates one would have gotten with a fixed effects model with an occasion by student interaction14.

Here we can see that the mixed model intercepts are generally not as extreme, i.e. the tails of the distribution have been pulled toward the overall effect. Same goes for the slopes. In both cases the mixed model shrinks what would have been the by-group estimate, which would otherwise overfit in this scenario. This regularizing effect is yet another bonus when using mixed models15. It comes into play largely when we have few observations per group and less estimated variance for the random effects. In other words, when there is little information in a group, or less group-level variance relative to the observation variance, then the mixed model will produce a group-specific effect that is closer to the overall population effect. In that sense, the mixed model group coefficients better reflect our ignorance. Conversely, with more pronounced group effects, our uncertainty about the overall effect increases.

The following shows what would happen to similar data under a variety of settings with simulated data that is based on the results of the GPA model we had above. On the far left is the plot just shown, our current data setting. Then there are four settings to go along with the original results. The first shows what would happen had we taken many more measurements per student. In the next, we add to the intercept and slope variance, and decrease the residual variance, but keep the sample size the same as the original data. In both cases we have a less regularizing effect of the mixed model. The random coefficients are very similar to the separate regressions results. Then, we keep the data the same but where we only have 4 observations per student, which shows more variability in the per-student results, and thus relatively more shrinkage with the mixed model. Finally, we add to the number of occasions per student (10), but have dropout over time, and so have roughly the same amount of data, but which is imbalanced. For more on this see topic, see my post here.

Visualization of effects

Now let’s compare our predictions visually. First there is the linear regression fit. We assume the same starting point and trend for everyone. If we add the conditional predictions that include the subject specific effects from the mixed model, we now can also make subject specific predictions, greatly enhancing the practical use of the model.

In contrast, the by-group approach is more noisy due to treating everyone independently. Many more students are expected to have downward or flat trends relative to the mixed model. The mixed model meanwhile only had 3 trends estimated to be negative.

Summary of Random Slopes

At this point it might be clearer why some would call these richly parameterized linear models. Relative to a standard regression we get extra variance parameters that add to our understanding of the sources of uncertainty in the model, we can get the subjects specific effects, and their correlation, and use that information to obtain far better predictions. What’s not to like?

Exercises for Random Slopes

Sleep revisited

Run the sleep study model with random coefficient for the Days effect, and interpret the results. What is the correlation between the intercept and Days random effects? Use the ranef and coef functions on the model you’ve created to inspect the individual-specific effects. What do you see?


In the following, replace model with the name of your model object. Run each line, inspecting the result of each as you go along.

re = ranef(model)$Subject
fe = fixef(model)

apply(re, 1, function(x) x + fe) %>% t()

The above code adds the fixed effects to each row of the random effects (the t just transposes the result). What is the result compared to what you saw before?

Simulation revisited

The following shows a simplified way to simulate some random slopes, but otherwise is the same as the simulation before. Go ahead and run the code.

set.seed(1234)  # this will allow you to exactly duplicate your result
Ngroups = 50
NperGroup = 3
N = Ngroups * NperGroup
groups = factor(rep(1:Ngroups, each = NperGroup))
re_int = rnorm(Ngroups, sd = .75)
re_slope = rnorm(Ngroups, sd = .25)
e = rnorm(N, sd = .25)
x = rnorm(N)
y = (2 + re_int[groups]) + (.5 + re_slope[groups]) * x + e

d = data.frame(x, y, groups)

This next bit of code shows a way to run a mixed model while specifying that there is no correlation between intercepts and slopes. There is generally no reason to do this unless the study design warrants it16, but you could do it as a step in the model-building process, such that you fit a model with no correlation, then one with it.

model_ints_only = lmer(y ~ x + (1|groups), data = d)

model_with_slopes = lmer(y ~ x + (1|groups) + (0 + x|groups), data = d)




ggplot(aes(x, y), data = d) +

Compare model fit using the AIC function, e.g. AIC(model). The model with the lower AIC is the better model, so which would you choose?

  1. Technically the intercept is assumed but you should keep it for clarity.↩︎

  2. In case it’s not clear, I’m using the fact that we assume a normal distribution for the random effect of occasion. A quick rule of thumb for a normal distribution is that 95% falls between \(\pm\) 2 standard deviations of the mean.↩︎

  3. This is just one issue with a fixed effects approach. You would have to estimate 400 parameters, but without anything (inherently) to guard against overfitting. The so-called fixed effects model from the econometrics perspective gets around this by demeaning variables that vary within groups, i.e. subtracting the per group mean. This is also equivalent to a model adding a dummy variable for the groups, though it’s a computationally more viable model to fit, as one no longer includes the grouping variable in the model (not really a concern with data FE models are actually applied to and it being what year it is). But while it controls for group level effects, we still cannot estimate them. Traditional approaches to fixed effects models also do not have any way to incorporate group-specific slopes, except perhaps by way of an interaction of a covariate with the cluster, which brings you back to square one of having to estimate a lot of parameters. For more about FE models and their issues, see my document on clustered data approaches, and Bell et al. (2016). Fixed and Random effects: making an informed choice.↩︎

  4. This phenomenon is also sometimes referred to as partial pooling. This idea of pooling is as in ‘pooling resources’ or borrowing strength. You have complete pooling, which would be the standard regression model case of ignoring the clusters, i.e. all cluster effects are assumed to be the same. With no pooling, we assumes the clusters have nothing in common, i.e. the separate regressions approach. Partial pooling is seen in the mixed model scenario, where the similarity among the clusters is estimated in some fashion, and data for all observations informs the estimates for each cluster. I’ve never really liked the ‘pooling’ terminology, as regularization/shrinkage is a more broad concept that applies beyond mixed models, and I’d prefer to stick to that. In any case, if interested in more, see practically anything Andrew Gelman has written on it, and the pool-no-pool document here.↩︎

  5. I personally have not come across a situation where I’d do this in practice. Even if the simpler model with no correlation was a slightly better fit, there isn’t much to be gained by it.↩︎