Exploring random slopes for categorical covariates and similar models
Prerequisites: familiarity with mixed models
It’s often the case where, for mixed models, we want to look at random ‘slopes’ as well as random intercepts, such that coefficients for the fixed effects are expected to vary by group. This is very common in longitudinal settings, were we want to examine an overall trend, but allow the trend to vary by individual.
In such settings, when time is numeric, things are straightforward. The variance components are decomposed into parts for the intercept, the coefficient for the time indicator, and the residual variance (for linear mixed models). But what happens if we have only three time points? Does it make sense to treat it as numeric and hope for the best?
This came up in consulting because someone had a similar issue, and tried to keep the format for random slopes while treating the time indicator as categorical. This led to convergence issues, so we thought about what models might be possible. This post explores that scenario.
Packages used:
library(tidyverse)
library(lme4)
library(mixedup) # http://m-clark.github.io/mixedup
Let’s start with a very simple data set from the nlme package, which comes with the standard R installation. The reason I chose this is because Doug Bates has a good treatment on this topic using that example (starting at slide 85), which I just extend a bit.
Here is the data description from the help file.
Data on an experiment to compare three brands of machines used in an industrial process are presented in Milliken and Johnson (p. 285, 1992). Six workers were chosen randomly among the employees of a factory to operate each machine three times. The response is an overall productivity score taking into account the number and quality of components produced.
So for each worker and each machine, we’ll have three scores. Let’s look.
machines = nlme::Machines
# for some reason worker is an ordered factor.
machines = machines %>%
mutate(Worker = factor(Worker, levels = 1:6, ordered = FALSE))
Worker | Machine | score |
---|---|---|
1 | A | 52.0 |
1 | A | 52.8 |
1 | A | 53.1 |
1 | B | 62.1 |
1 | B | 62.6 |
1 | B | 64.0 |
This duplicates the plot in Bates’ notes and visually describes the entire data set. There likely is variability due to both workers and machines.
The random effects of potential interest are for worker and machine, so how do we specify this? Let’s try a standard approach. The following is the type of model tried by our client.
model_m_slope = lmer(score ~ Machine + (1 + Machine | Worker), machines)
Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, : Model failed to converge with max|grad| =
0.00805193 (tol = 0.002, component 1)
This was exactly the same issue our client had- problematic convergence. This could be more of an issue with lme4, and we could certainly explore tweaks to make the problem go away (or use a different package like glmmTMB), but let’s go ahead and keep it.
summarize_model(model_m_slope, ci = FALSE, cor_re = TRUE)
Variance Components:
Group Effect Variance SD Var_prop
Worker Intercept 16.68 4.08 0.25
Worker MachineB 34.72 5.89 0.53
Worker MachineC 13.62 3.69 0.21
Residual 0.92 0.96 0.01
Correlation of Random Effects:
Intercept MachineB MachineC
Intercept 1.00 0.49 -0.36
MachineB 0.49 1.00 0.30
MachineC -0.36 0.30 1.00
Fixed Effects:
Term Value SE t P_value Lower_2.5 Upper_97.5
Intercept 52.36 1.68 31.12 0.00 49.06 55.65
MachineB 7.97 2.43 3.28 0.00 3.21 12.72
MachineC 13.92 1.54 9.03 0.00 10.90 16.94
We get the variance components we expect, i.e. the variance attributable to the intercept (i.e. Machine A), as well as for the slopes for the difference in machine B vs. A, and C vs. A. We also see the correlations among the random effects. It’s this part that Bates acknowledges is hard to estimate, and incurs estimating potentially notably more parameters than typical random effects models. We have different options that will be available to us though, so let’s try some.
Let’s start with the simplest, most plausible models. The first would be to have at least a worker effect. The next baseline model could be if we only had a machine by worker effect, i.e. a separate effect of each machine for each worker, essentially treating the interaction term as the sole clustering unit.
model_base_w = lmer(score ~ Machine + (1 | Worker), machines)
model_base_wm = lmer(score ~ Machine + (1 | Worker:Machine), machines)
Examining the random effects makes clear the difference between the two models. For our first baseline model, we only have 6 effects, one for each worker. For the second we have an effect of each machine for each worker.
extract_random_effects(model_base_w) # only 6 effects
extract_random_effects(model_base_wm) # 6 workers by 3 machines = 18 effects
group_var | effect | group | value | se | lower_2.5 | upper_97.5 |
---|---|---|---|---|---|---|
Worker | Intercept | 1 | 1.210 | 1.032 | -0.813 | 3.234 |
Worker | Intercept | 2 | -1.594 | 1.032 | -3.618 | 0.429 |
Worker | Intercept | 3 | 6.212 | 1.032 | 4.188 | 8.235 |
Worker | Intercept | 4 | -0.069 | 1.032 | -2.093 | 1.954 |
Worker | Intercept | 5 | 2.949 | 1.032 | 0.925 | 4.972 |
Worker | Intercept | 6 | -8.707 | 1.032 | -10.731 | -6.684 |
group_var | effect | group | value | se | lower_2.5 | upper_97.5 |
---|---|---|---|---|---|---|
Worker:Machine | Intercept | 1:A | 0.275 | 0.553 | -0.808 | 1.359 |
Worker:Machine | Intercept | 1:B | 2.556 | 0.553 | 1.473 | 3.640 |
Worker:Machine | Intercept | 1:C | 0.920 | 0.553 | -0.164 | 2.004 |
Worker:Machine | Intercept | 2:A | 0.209 | 0.553 | -0.874 | 1.293 |
Worker:Machine | Intercept | 2:B | -0.749 | 0.553 | -1.833 | 0.334 |
Worker:Machine | Intercept | 2:C | -4.402 | 0.553 | -5.486 | -3.318 |
Worker:Machine | Intercept | 3:A | 7.118 | 0.553 | 6.035 | 8.202 |
Worker:Machine | Intercept | 3:B | 7.647 | 0.553 | 6.563 | 8.731 |
Worker:Machine | Intercept | 3:C | 4.490 | 0.553 | 3.407 | 5.574 |
Worker:Machine | Intercept | 4:A | -1.113 | 0.553 | -2.196 | -0.029 |
Worker:Machine | Intercept | 4:B | 2.391 | 0.553 | 1.307 | 3.475 |
Worker:Machine | Intercept | 4:C | -1.493 | 0.553 | -2.577 | -0.409 |
Worker:Machine | Intercept | 5:A | -0.981 | 0.553 | -2.064 | 0.103 |
Worker:Machine | Intercept | 5:B | 4.705 | 0.553 | 3.621 | 5.789 |
Worker:Machine | Intercept | 5:C | 5.416 | 0.553 | 4.332 | 6.499 |
Worker:Machine | Intercept | 6:A | -5.509 | 0.553 | -6.593 | -4.426 |
Worker:Machine | Intercept | 6:B | -16.550 | 0.553 | -17.634 | -15.467 |
Worker:Machine | Intercept | 6:C | -4.931 | 0.553 | -6.014 | -3.847 |
As a next step, we’ll essentially combine our two baseline models.
model_w_wm = lmer(score ~ Machine + (1 | Worker) + (1 | Worker:Machine), machines)
Now we have 6 worker effects plus 18 machine within worker effects1.
extract_random_effects(model_w_wm)
group_var | effect | group | value | se | lower_2.5 | upper_97.5 |
---|---|---|---|---|---|---|
Worker:Machine | Intercept | 1:A | -0.750 | 2.015 | -4.699 | 3.198 |
Worker:Machine | Intercept | 1:B | 1.500 | 2.015 | -2.449 | 5.449 |
Worker:Machine | Intercept | 1:C | -0.114 | 2.015 | -4.063 | 3.834 |
Worker:Machine | Intercept | 2:A | 1.553 | 2.015 | -2.396 | 5.501 |
Worker:Machine | Intercept | 2:B | 0.607 | 2.015 | -3.342 | 4.555 |
Worker:Machine | Intercept | 2:C | -2.997 | 2.015 | -6.945 | 0.952 |
Worker:Machine | Intercept | 3:A | 1.778 | 2.015 | -2.171 | 5.726 |
Worker:Machine | Intercept | 3:B | 2.299 | 2.015 | -1.649 | 6.248 |
Worker:Machine | Intercept | 3:C | -0.815 | 2.015 | -4.763 | 3.134 |
Worker:Machine | Intercept | 4:A | -1.039 | 2.015 | -4.988 | 2.909 |
Worker:Machine | Intercept | 4:B | 2.417 | 2.015 | -1.531 | 6.366 |
Worker:Machine | Intercept | 4:C | -1.414 | 2.015 | -5.363 | 2.534 |
Worker:Machine | Intercept | 5:A | -3.457 | 2.015 | -7.405 | 0.492 |
Worker:Machine | Intercept | 5:B | 2.152 | 2.015 | -1.796 | 6.101 |
Worker:Machine | Intercept | 5:C | 2.853 | 2.015 | -1.095 | 6.802 |
Worker:Machine | Intercept | 6:A | 1.916 | 2.015 | -2.032 | 5.865 |
Worker:Machine | Intercept | 6:B | -8.976 | 2.015 | -12.924 | -5.027 |
Worker:Machine | Intercept | 6:C | 2.487 | 2.015 | -1.462 | 6.435 |
Worker | Intercept | 1 | 1.045 | 1.981 | -2.839 | 4.928 |
Worker | Intercept | 2 | -1.376 | 1.981 | -5.259 | 2.507 |
Worker | Intercept | 3 | 5.361 | 1.981 | 1.478 | 9.244 |
Worker | Intercept | 4 | -0.060 | 1.981 | -3.943 | 3.823 |
Worker | Intercept | 5 | 2.545 | 1.981 | -1.339 | 6.428 |
Worker | Intercept | 6 | -7.514 | 1.981 | -11.397 | -3.631 |
If you look closely at these effects, and add them together, you will get a value similar to our second baseline model, which is probably not too surprising. For example in the above model 1:B + 1
= 1.5 +
1.045. Looking at the initial model, the estimated random effect for 1:B
was 2.556. Likewise if we look at the variance components, we can see that the sum of the non-residual effect variances for model_w_wm
equals the variance of model_base_wm
(36.8). So this latest model allows us to disentangle the worker and machine effects, where our baseline models did not.
Next we’ll do the ‘vector-valued’ model Bates describes. This removes the intercept portion of the formula in the original random slopes model, but is otherwise the same. We can look at the results here, but I will hold off description for comparing it to other models. Note that at least have no convergence problem.
model_m_vv = lmer(score ~ Machine + (0 + Machine | Worker), machines)
summarize_model(model_m_vv, ci = 0, cor_re = TRUE)
Variance Components:
Group Effect Variance SD Var_prop
Worker MachineA 16.67 4.08 0.15
Worker MachineB 74.30 8.62 0.67
Worker MachineC 19.27 4.39 0.17
Residual 0.93 0.96 0.01
Correlation of Random Effects:
MachineA MachineB MachineC
MachineA 1.00 0.80 0.62
MachineB 0.80 1.00 0.77
MachineC 0.62 0.77 1.00
Fixed Effects:
Term Value SE t P_value Lower_2.5 Upper_97.5
Intercept 52.36 1.68 31.12 0.00 49.06 55.65
MachineB 7.97 2.42 3.30 0.00 3.23 12.70
MachineC 13.92 1.54 9.05 0.00 10.90 16.93
Now let’s extract the fixed effect and variance component summaries for all the models.
model_list = mget(ls(pattern = 'model_'))
fe = map_df(model_list, extract_fixed_effects, .id = 'model')
vc = map_df(model_list, extract_vc, ci_level = 0, .id = 'model')
First, let’s look at the fixed effects. We see that there are no differences in the coefficients for the fixed effect of machine, which is our only covariate in the model. However, there are notable differences for the estimated standard errors. Practically we’d come to no differences in our conclusions, but the uncertainty associated with them would be different.
model | term | value | se | t | p_value | lower_2.5 | upper_97.5 |
---|---|---|---|---|---|---|---|
model_base_w | Intercept | 52.356 | 2.229 | 23.485 | 0.000 | 47.986 | 56.725 |
model_base_w | MachineB | 7.967 | 1.054 | 7.559 | 0.000 | 5.901 | 10.032 |
model_base_w | MachineC | 13.917 | 1.054 | 13.205 | 0.000 | 11.851 | 15.982 |
model_base_wm | Intercept | 52.356 | 2.486 | 21.062 | 0.000 | 47.483 | 57.228 |
model_base_wm | MachineB | 7.967 | 3.516 | 2.266 | 0.023 | 1.076 | 14.857 |
model_base_wm | MachineC | 13.917 | 3.516 | 3.959 | 0.000 | 7.026 | 20.807 |
model_m_slope | Intercept | 52.356 | 1.683 | 31.116 | 0.000 | 49.058 | 55.653 |
model_m_slope | MachineB | 7.967 | 2.427 | 3.283 | 0.001 | 3.210 | 12.723 |
model_m_slope | MachineC | 13.917 | 1.541 | 9.033 | 0.000 | 10.897 | 16.936 |
model_m_vv | Intercept | 52.356 | 1.682 | 31.122 | 0.000 | 49.058 | 55.653 |
model_m_vv | MachineB | 7.967 | 2.416 | 3.297 | 0.001 | 3.230 | 12.703 |
model_m_vv | MachineC | 13.917 | 1.537 | 9.053 | 0.000 | 10.904 | 16.930 |
model_w_wm | Intercept | 52.356 | 2.486 | 21.062 | 0.000 | 47.483 | 57.228 |
model_w_wm | MachineB | 7.967 | 2.177 | 3.660 | 0.000 | 3.700 | 12.233 |
model_w_wm | MachineC | 13.917 | 2.177 | 6.393 | 0.000 | 9.650 | 18.183 |
Here are the variance components, there are definitely some differences here, but, as we’ll see, maybe not as much as we suspect.
model | group | effect | variance | sd | var_prop |
---|---|---|---|---|---|
model_base_w | Worker | Intercept | 26.487 | 5.147 | 0.726 |
model_base_w | Residual | 9.996 | 3.162 | 0.274 | |
model_base_wm | Worker:Machine | Intercept | 36.768 | 6.064 | 0.975 |
model_base_wm | Residual | 0.925 | 0.962 | 0.025 | |
model_m_slope | Worker | Intercept | 16.679 | 4.084 | 0.253 |
model_m_slope | Worker | MachineB | 34.717 | 5.892 | 0.526 |
model_m_slope | Worker | MachineC | 13.625 | 3.691 | 0.207 |
model_m_slope | Residual | 0.924 | 0.961 | 0.014 | |
model_m_vv | Worker | MachineA | 16.672 | 4.083 | 0.150 |
model_m_vv | Worker | MachineB | 74.302 | 8.620 | 0.668 |
model_m_vv | Worker | MachineC | 19.270 | 4.390 | 0.173 |
model_m_vv | Residual | 0.925 | 0.962 | 0.008 | |
model_w_wm | Worker:Machine | Intercept | 13.909 | 3.730 | 0.369 |
model_w_wm | Worker | Intercept | 22.858 | 4.781 | 0.606 |
model_w_wm | Residual | 0.925 | 0.962 | 0.025 |
We can see that the base_wm
model has (non-residual) variance 36.768. This equals the total of the two (non-residual) variance components of the w_wm
model 13.909 +
22.858, which again speaks to the latter model decomposing a machine effect into worker + machine effects. This value also equals the variance of the vector-valued model divided by the number of groups (16.672 + 74.302 + 19.27) /
3.
We can see that the estimated random effects from the vector-valued model (m_vv
) are essentially the same as from the baseline, interaction-only model. However, the way it is estimated allows for incorporation of correlations among the machine random effects, so they are not identical (but pretty close).
extract_random_effects(model_m_vv)
extract_random_effects(model_base_wm)
group_var | effect | group | value | se | lower_2.5 | upper_97.5 |
---|---|---|---|---|---|---|
Worker | MachineA | 1 | 0.312 | 0.541 | -0.749 | 1.373 |
Worker | MachineB | 1 | 2.553 | 0.551 | 1.474 | 3.632 |
Worker | MachineC | 1 | 0.930 | 0.545 | -0.137 | 1.998 |
Worker | MachineA | 2 | 0.183 | 0.541 | -0.878 | 1.245 |
Worker | MachineB | 2 | -0.803 | 0.551 | -1.882 | 0.276 |
Worker | MachineC | 2 | -4.282 | 0.545 | -5.350 | -3.214 |
Worker | MachineA | 3 | 6.969 | 0.541 | 5.908 | 8.031 |
Worker | MachineB | 3 | 7.779 | 0.551 | 6.700 | 8.858 |
Worker | MachineC | 3 | 4.474 | 0.545 | 3.406 | 5.542 |
Worker | MachineA | 4 | -1.024 | 0.541 | -2.085 | 0.037 |
Worker | MachineB | 4 | 2.328 | 0.551 | 1.249 | 3.408 |
Worker | MachineC | 4 | -1.415 | 0.545 | -2.482 | -0.347 |
Worker | MachineA | 5 | -0.849 | 0.541 | -1.910 | 0.212 |
Worker | MachineB | 5 | 4.726 | 0.551 | 3.647 | 5.805 |
Worker | MachineC | 5 | 5.323 | 0.545 | 4.255 | 6.390 |
Worker | MachineA | 6 | -5.592 | 0.541 | -6.653 | -4.531 |
Worker | MachineB | 6 | -16.583 | 0.551 | -17.662 | -15.504 |
Worker | MachineC | 6 | -5.030 | 0.545 | -6.098 | -3.963 |
group_var | effect | group | value | se | lower_2.5 | upper_97.5 |
---|---|---|---|---|---|---|
Worker:Machine | Intercept | 1:A | 0.275 | 0.553 | -0.808 | 1.359 |
Worker:Machine | Intercept | 1:B | 2.556 | 0.553 | 1.473 | 3.640 |
Worker:Machine | Intercept | 1:C | 0.920 | 0.553 | -0.164 | 2.004 |
Worker:Machine | Intercept | 2:A | 0.209 | 0.553 | -0.874 | 1.293 |
Worker:Machine | Intercept | 2:B | -0.749 | 0.553 | -1.833 | 0.334 |
Worker:Machine | Intercept | 2:C | -4.402 | 0.553 | -5.486 | -3.318 |
Worker:Machine | Intercept | 3:A | 7.118 | 0.553 | 6.035 | 8.202 |
Worker:Machine | Intercept | 3:B | 7.647 | 0.553 | 6.563 | 8.731 |
Worker:Machine | Intercept | 3:C | 4.490 | 0.553 | 3.407 | 5.574 |
Worker:Machine | Intercept | 4:A | -1.113 | 0.553 | -2.196 | -0.029 |
Worker:Machine | Intercept | 4:B | 2.391 | 0.553 | 1.307 | 3.475 |
Worker:Machine | Intercept | 4:C | -1.493 | 0.553 | -2.577 | -0.409 |
Worker:Machine | Intercept | 5:A | -0.981 | 0.553 | -2.064 | 0.103 |
Worker:Machine | Intercept | 5:B | 4.705 | 0.553 | 3.621 | 5.789 |
Worker:Machine | Intercept | 5:C | 5.416 | 0.553 | 4.332 | 6.499 |
Worker:Machine | Intercept | 6:A | -5.509 | 0.553 | -6.593 | -4.426 |
Worker:Machine | Intercept | 6:B | -16.550 | 0.553 | -17.634 | -15.467 |
Worker:Machine | Intercept | 6:C | -4.931 | 0.553 | -6.014 | -3.847 |
Even the default way that the extracted random effects are structured implies this difference. In the vector-valued model we have a multivariate normal draw for 3 machines (i.e. 3 variances and 3 covariances) for each of six workers. In the baseline model, we do not estimate any covariances and assume equal variance to draw for 18 groups (1 variance).
ranef(model_m_vv)
$Worker
MachineA MachineB MachineC
1 0.3121470 2.5531279 0.9302595
2 0.1833294 -0.8032619 -4.2820004
3 6.9694168 7.7792265 4.4740418
4 -1.0238949 2.3283434 -1.4146891
5 -0.8487803 4.7258593 5.3227211
6 -5.5922179 -16.5832953 -5.0303328
with conditional variances for "Worker"
ranef(model_base_wm)
$`Worker:Machine`
(Intercept)
1:A 0.2754687
1:B 2.5563491
1:C 0.9200653
2:A 0.2093562
2:B -0.7492747
2:C -4.4019891
3:A 7.1181101
3:B 7.6470099
3:C 4.4901391
4:A -1.1128934
4:B 2.3910679
4:C -1.4930401
5:A -0.9806684
5:B 4.7050047
5:C 5.4157138
6:A -5.5093731
6:B -16.5501569
6:C -4.9308890
with conditional variances for "Worker:Machine"
Now let’s compare the models directly via AIC. As we would expect if we dummy coded a predictor vs. running a model without the intercept (e.g. lm(score ~ machine)
, vs. lm(score ~ -1 + machine)
), the random slope model and vector-valued models are identical and produce the same AIC. Likewise the intercept variance of the former is equal to the first group variance of the vector-valued model.
model_base_w | model_base_wm | model_m_slope | model_m_vv | model_w_wm |
---|---|---|---|---|
296.878 | 231.256 | 228.311 | 228.311 | 227.688 |
While such a model is doing better than either of our baseline models, it turns out that our other approach is slightly better, as the additional complexity of estimating the covariances and separate variances wasn’t really worth it.
At this point we’ve seen a couple of ways of doing a model in this situation. Some may be a little too simplistic for a given scenario, others may not capture the correlation structure the way we’d want. In any case, we have options to explore.
The following is a simplified approach to creating data in this scenario, and allows us to play around with the settings to see what happens.
First we need some data. The following creates a group identifier similar to Worker
in our previous example, a cat_var
like our Machine
, and other covariates just to make it interesting.
# for simplicity keeping to 3 cat levels
set.seed(1234)
ng = 5000 # n groups
cat_levs = 3 # n levels per group
reps = 4 # number of obs per level per cat
id = rep(1:ng, each = cat_levs * reps) # id indicator (e.g. like Worker)
cat_var = rep(1:cat_levs, times = ng, e = reps) # categorical variable (e.g. Machine)
x = rnorm(ng * cat_levs * reps) # continuous covariate
x_c = rep(rnorm(ng), e = cat_levs * reps) # continuous cluster level covariate
So we have the basic data in place, now we need to create the random effects. There are several ways we could do this, including more efficient ones, but this approach focuses on a conceptual approach and on the model that got us here, i.e. something of the form (1 + cat_var | group)
. In this case we assume this model is ‘correct’, so we’re going to create a multivariate normal draw of random effects for each level of the cat_var
, which is only 3 levels. The correlations depicted are the estimates we expect from our model for the random effects2.
# as correlated (1, .5, .5) var, (1, .25, .25) sd
cov_mat = lazerhawk::create_corr(c(.1, .25, .25), diagonal = c(1, .5, .5))
cov2cor(cov_mat) # these will be the estimated correlations for the random_slope model
[,1] [,2] [,3]
[1,] 1.0000000 0.1414214 0.3535534
[2,] 0.1414214 1.0000000 0.5000000
[3,] 0.3535534 0.5000000 1.0000000
Now we create the random effects by drawing an effect for each categorical level for each group.
# take a multivariate normal draw for each of the groups in `id`
re_id_cat_lev = mvtnorm::rmvnorm(ng, mean = rep(0, 3), sigma = cov_mat) %>%
data.frame()
head(re_id_cat_lev)
X1 X2 X3
1 0.5618465 0.62780841 1.15175459
2 0.6588685 0.78045939 0.25942427
3 0.2680315 -0.06403496 1.30173301
4 -0.3711184 0.37579392 -0.03242486
5 -1.1306064 -0.08450038 -0.38165685
6 -0.7537223 0.65320469 0.33269260
Now that we have the random effects, we can create our target variable. We do this by adding our first effect to the intercept, and the others to their respective coefficients.
y =
# fixed effect = (2, .5, -.5)
2 + .5*x - .5*x_c +
# random intercept
rep(re_id_cat_lev[, 1], each = cat_levs * reps) +
# .25 is the fixef for group 2 vs. 1
(.25 + rep(re_id_cat_lev[, 2], each = cat_levs * reps)) * (cat_var == 2) +
# .40 is the fixef for group 3 vs. 1
(.40 + rep(re_id_cat_lev[, 3], each = cat_levs * reps)) * (cat_var == 3) +
rnorm(ng * cat_levs * reps, sd = .5)
Now we create a data frame so we can see everything together.
df = tibble(
id,
cat_var,
x,
x_c,
y,
re_id = rep(re_id_cat_lev[, 1], each = cat_levs*reps),
re_id_cat_lev2 = rep(re_id_cat_lev[, 2], each = cat_levs*reps),
re_id_cat_lev3 = rep(re_id_cat_lev[, 3], each = cat_levs*reps)
) %>%
mutate(
cat_var = factor(cat_var),
cat_as_num = as.integer(cat_var),
id = factor(id)
)
df %>% print(n = 30)
# A tibble: 60,000 x 9
id cat_var x x_c y re_id re_id_cat_lev2 re_id_cat_lev3 cat_as_num
<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 1 1 -1.21 -1.43 3.04 0.562 0.628 1.15 1
2 1 1 0.277 -1.43 3.35 0.562 0.628 1.15 1
3 1 1 1.08 -1.43 3.46 0.562 0.628 1.15 1
4 1 1 -2.35 -1.43 2.64 0.562 0.628 1.15 1
5 1 2 0.429 -1.43 4.59 0.562 0.628 1.15 2
6 1 2 0.506 -1.43 3.61 0.562 0.628 1.15 2
7 1 2 -0.575 -1.43 3.54 0.562 0.628 1.15 2
8 1 2 -0.547 -1.43 2.98 0.562 0.628 1.15 2
9 1 3 -0.564 -1.43 5.28 0.562 0.628 1.15 3
10 1 3 -0.890 -1.43 4.06 0.562 0.628 1.15 3
11 1 3 -0.477 -1.43 4.31 0.562 0.628 1.15 3
12 1 3 -0.998 -1.43 4.70 0.562 0.628 1.15 3
13 2 1 -0.776 0.126 2.25 0.659 0.780 0.259 1
14 2 1 0.0645 0.126 2.48 0.659 0.780 0.259 1
15 2 1 0.959 0.126 2.99 0.659 0.780 0.259 1
16 2 1 -0.110 0.126 2.55 0.659 0.780 0.259 1
17 2 2 -0.511 0.126 3.94 0.659 0.780 0.259 2
18 2 2 -0.911 0.126 3.22 0.659 0.780 0.259 2
19 2 2 -0.837 0.126 3.98 0.659 0.780 0.259 2
20 2 2 2.42 0.126 4.59 0.659 0.780 0.259 2
21 2 3 0.134 0.126 3.58 0.659 0.780 0.259 3
22 2 3 -0.491 0.126 3.31 0.659 0.780 0.259 3
23 2 3 -0.441 0.126 2.66 0.659 0.780 0.259 3
24 2 3 0.460 0.126 3.42 0.659 0.780 0.259 3
25 3 1 -0.694 0.437 0.985 0.268 -0.0640 1.30 1
26 3 1 -1.45 0.437 2.08 0.268 -0.0640 1.30 1
27 3 1 0.575 0.437 2.62 0.268 -0.0640 1.30 1
28 3 1 -1.02 0.437 1.24 0.268 -0.0640 1.30 1
29 3 2 -0.0151 0.437 2.24 0.268 -0.0640 1.30 2
30 3 2 -0.936 0.437 2.57 0.268 -0.0640 1.30 2
# … with 5.997e+04 more rows
With everything in place, let’s run four models similar to our previous models from the Machine example:
m_interaction_only = lmer(y ~ x + x_c + cat_var + (1 | id:cat_var), df)
m_random_slope = lmer(y ~ x + x_c + cat_var + (1 + cat_var | id), df)
m_vector_valued = lmer(y ~ x + x_c + cat_var + (0 + cat_var | id), df)
m_separate_re = lmer(y ~ x + x_c + cat_var + (1 | id) + (1 | id:cat_var), df)
model_mixed = list(
m_interaction_only = m_interaction_only,
m_random_slope = m_random_slope,
m_vector_valued = m_vector_valued,
m_separate_re = m_separate_re
)
# model summaries if desired
# map(model_mixed, summarize_model, ci = 0, cor_re = TRUE)
fe = map_df(model_mixed, extract_fixed_effects, .id = 'model')
vc = map_df(model_mixed, extract_vc, ci_level = 0, .id = 'model')
Looking at the fixed effects, we get what we should but, as before, we do see differences in the standard errors.
model | term | value | se | t | p_value | lower_2.5 | upper_97.5 |
---|---|---|---|---|---|---|---|
m_interaction_only | Intercept | 2.006 | 0.018 | 111.051 | 0 | 1.971 | 2.042 |
m_interaction_only | x | 0.497 | 0.002 | 212.962 | 0 | 0.492 | 0.501 |
m_interaction_only | x_c | -0.489 | 0.010 | -46.776 | 0 | -0.509 | -0.469 |
m_interaction_only | cat_var2 | 0.256 | 0.026 | 10.026 | 0 | 0.206 | 0.306 |
m_interaction_only | cat_var3 | 0.389 | 0.026 | 15.228 | 0 | 0.339 | 0.439 |
m_random_slope | Intercept | 2.006 | 0.015 | 136.942 | 0 | 1.977 | 2.035 |
m_random_slope | x | 0.497 | 0.002 | 217.676 | 0 | 0.493 | 0.502 |
m_random_slope | x_c | -0.495 | 0.014 | -34.636 | 0 | -0.523 | -0.467 |
m_random_slope | cat_var2 | 0.256 | 0.011 | 23.038 | 0 | 0.234 | 0.278 |
m_random_slope | cat_var3 | 0.389 | 0.011 | 34.969 | 0 | 0.367 | 0.411 |
m_vector_valued | Intercept | 2.006 | 0.015 | 136.944 | 0 | 1.977 | 2.035 |
m_vector_valued | x | 0.497 | 0.002 | 217.676 | 0 | 0.493 | 0.502 |
m_vector_valued | x_c | -0.495 | 0.014 | -34.636 | 0 | -0.523 | -0.467 |
m_vector_valued | cat_var2 | 0.256 | 0.011 | 23.038 | 0 | 0.234 | 0.278 |
m_vector_valued | cat_var3 | 0.389 | 0.011 | 34.969 | 0 | 0.367 | 0.411 |
m_separate_re | Intercept | 2.006 | 0.018 | 111.045 | 0 | 1.971 | 2.042 |
m_separate_re | x | 0.497 | 0.002 | 216.586 | 0 | 0.493 | 0.502 |
m_separate_re | x_c | -0.489 | 0.017 | -28.883 | 0 | -0.522 | -0.456 |
m_separate_re | cat_var2 | 0.256 | 0.011 | 23.077 | 0 | 0.234 | 0.278 |
m_separate_re | cat_var3 | 0.389 | 0.011 | 35.050 | 0 | 0.367 | 0.411 |
The variance components break down as before.
model | group | effect | variance | sd | var_prop |
---|---|---|---|---|---|
m_interaction_only | id:cat_var | Intercept | 1.570 | 1.253 | 0.864 |
m_interaction_only | Residual | 0.247 | 0.497 | 0.136 | |
m_random_slope | id | Intercept | 1.011 | 1.006 | 0.450 |
m_random_slope | id | cat_var2 | 0.494 | 0.703 | 0.220 |
m_random_slope | id | cat_var3 | 0.495 | 0.704 | 0.220 |
m_random_slope | Residual | 0.247 | 0.497 | 0.110 | |
m_vector_valued | id | cat_var1 | 1.011 | 1.006 | 0.204 |
m_vector_valued | id | cat_var2 | 1.709 | 1.307 | 0.345 |
m_vector_valued | id | cat_var3 | 1.990 | 1.411 | 0.401 |
m_vector_valued | Residual | 0.247 | 0.497 | 0.050 | |
m_separate_re | id:cat_var | Intercept | 0.246 | 0.496 | 0.135 |
m_separate_re | id | Intercept | 1.324 | 1.151 | 0.728 |
m_separate_re | Residual | 0.247 | 0.497 | 0.136 |
In this case, we know the model with correlated random effects is the more accurate model, and this is born out via AIC.
m_interaction_only | m_random_slope | m_vector_valued | m_separate_re |
---|---|---|---|
135592.8 | 122051.9 | 122051.9 | 123745.2 |
Now I will make the vector_valued
model reduce to the separate_re
model. First, we create a covariance matrix that has equal variances/covariances (i.e. compound symmetry), and for demonstration, we will apply the random effects a little differently. So, when we create the target variable, we make a slight alteration to apply it to the vector valued model instead.
set.seed(1234)
cov_mat = lazerhawk::create_corr(c(0.1, 0.1, 0.1), diagonal = c(.5, .5, .5))
cov2cor(cov_mat) # these will now be the estimated correlations for the vector_valued model
[,1] [,2] [,3]
[1,] 1.0 0.2 0.2
[2,] 0.2 1.0 0.2
[3,] 0.2 0.2 1.0
re_id_cat_lev = mvtnorm::rmvnorm(ng, mean = rep(0, 3), sigma = cov_mat) %>%
data.frame()
y = 2 + .5*x - .5*x_c + # fixed effect = (2, .5, -.5)
rep(re_id_cat_lev[, 1], each = cat_levs * reps) * (cat_var == 1) + # added this
rep(re_id_cat_lev[, 2], each = cat_levs * reps) * (cat_var == 2) +
rep(re_id_cat_lev[, 3], each = cat_levs * reps) * (cat_var == 3) +
.25 * (cat_var == 2) + # .25 is the fixef for group 2 vs. 1
.40 * (cat_var == 3) + # .40 is the fixef for group 3 vs. 1
rnorm(ng * cat_levs * reps, sd = .5)
df = tibble(
id,
cat_var = factor(cat_var),
x,
x_c,
y
)
Rerun the models.
m_random_slope = lmer(y ~ x + x_c + cat_var + (1 + cat_var | id), df) # still problems!
m_vector_valued = lmer(y ~ x + x_c + cat_var + (0 + cat_var | id), df)
m_separate_re = lmer(y ~ x + x_c + cat_var + (1 | id) + (1 | id:cat_var), df)
Examine the variance components.
model_mixed = list(
m_random_slope = m_random_slope,
m_vector_valued = m_vector_valued,
m_separate_re = m_separate_re
)
# model summaries if desired
# map(model_mixed, summarize_model, ci = 0, cor_re = TRUE)
# fixed effects if desired
# fe = map_df(model_mixed, extract_fixed_effects, .id = 'model')
vc = map_df(model_mixed, extract_vc, ci_level = 0, .id = 'model')
model | group | effect | variance | sd | var_prop |
---|---|---|---|---|---|
m_random_slope | id | Intercept | 0.491 | 0.700 | 0.213 |
m_random_slope | id | cat_var2 | 0.786 | 0.886 | 0.341 |
m_random_slope | id | cat_var3 | 0.778 | 0.882 | 0.337 |
m_random_slope | Residual | 0.251 | 0.501 | 0.109 | |
m_vector_valued | id | cat_var1 | 0.491 | 0.700 | 0.283 |
m_vector_valued | id | cat_var2 | 0.497 | 0.705 | 0.287 |
m_vector_valued | id | cat_var3 | 0.492 | 0.702 | 0.284 |
m_vector_valued | Residual | 0.251 | 0.501 | 0.145 | |
m_separate_re | id:cat_var | Intercept | 0.395 | 0.628 | 0.530 |
m_separate_re | id | Intercept | 0.099 | 0.314 | 0.133 |
m_separate_re | Residual | 0.251 | 0.501 | 0.337 |
In this case, we know the true case regards zero correlations and equal variances, so estimating them is adding complexity we don’t need, thus our simpler model wins (log likelihoods are essentially the same).
parameter | m_random_slope | m_vector_valued | m_separate_re |
---|---|---|---|
LL | -59818.9 | -59818.9 | -59819.72 |
AIC | 119661.8 | 119661.8 | 119655.45 |
Here we’ve demonstrated a couple of different ways to specify a particular model with random slopes for a categorical covariate. Intuition may lead to a model that is not easy to estimate, often leading to convergence problems. Sometimes, this model may be overly complicated, and a simpler version will likely have less estimation difficulty. Try it out if you run into trouble!
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com//m-clark/m-clark.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Clark (2020, March 1). Michael Clark: Categorical Effects as Random. Retrieved from https://m-clark.github.io/posts/2020-03-01-random-categorical/
BibTeX citation
@misc{clark2020categorical, author = {Clark, Michael}, title = {Michael Clark: Categorical Effects as Random}, url = {https://m-clark.github.io/posts/2020-03-01-random-categorical/}, year = {2020} }