Mixed models and latent growth curves (LGC) are common approaches to dealing with longitudinal/repeated measures data. I have demonstrated their equivalence elsewhere (e.g. here), but this document hits on some additional nuances. The only model packages needed here will be lme4, nlme, and lavaan.
I will not be explaining these models in detail; this document is for those already somewhat familiar with them. However, the key idea is that the random effects in the mixed model are the latent variables in the growth curve model.
In what follows we’ll examine data where clusters have multiple observations across time, and eventually other covariates will be added to the mix that are time-varying or time-constant. The following can serve as a quick reminder of what we’re dealing with. The following depicts a mixed model with random intercepts and random slopes for the time covariate for observations \(i\) in cluster \(c\).
\[y_{ic} = \alpha_c + \beta_c \cdot \mathrm{time}_{ic} + \epsilon_{ic}\]
\[\begin{bmatrix}\alpha_c \\ \beta_c \end{bmatrix} \sim \mathscr{N}(\mathbf{\mu}, \mathbf{\Sigma}) \]
\[ \mathbf{\mu} = \begin{bmatrix}\mu_\alpha \\ \mu_\beta \end{bmatrix}\] \[\mathbf{\Sigma} = \left[\begin{smallmatrix}\tau^2 & \rho \\ \rho & \varphi^2 \end{smallmatrix}\right]\]
\[\epsilon_{ic} \sim \mathscr{N}(0, \varsigma^2)\] In the above we have cluster-specific coefficients, \(\alpha_c\) and \(\beta_c\), drawn from a multivariate normal distribution with means, \(\mu_\alpha\) and \(\mu_\beta\), the soi-disant ‘fixed effects’. Each has their respective variance (\(\tau^2\) and \(\varphi^2\)), and some covariance (\(\rho\)), to go along with the residual variance at the observation level (\(\varsigma^2\)).
The same model is depicted graphically as a growth curve model in the structural equation modeling (SEM) context. I show this and subsequent graphs sans variances, but you can assume there is a little latent variable representing ‘all other causes’ pointing at the latent variables and endogenous variables (i.e. the Ys).
First things first, we need some data. Note that one could use lavaan to simulate the data as well if desired, but I think the following is clearer as it is more in line with standard regression.
library(tidyverse)
set.seed(1234)
# data size
n = 500
timepoints = 4
N = n*timepoints
# covariates
time = rep(0:3, times=n)
x = rnorm(N)
clustervar1 = rep(0:1, e=N/2)
subject = rep(1:n, each=4)
# fixed effects
intercept = .5
slope = .25
# random effects and observation level error
randomEffectsCorr = matrix(c(.75,.2,.2,.75), ncol=2)
randomEffects = MASS::mvrnorm(n, mu=c(0,0), Sigma = randomEffectsCorr, empirical=T) %>%
data.frame()
colnames(randomEffects) = c('Int', 'Slope')
sigma = .5
# target variable
y1 =
(intercept + randomEffects$Int[subject]) + # random intercepts
(slope + randomEffects$Slope[subject]) * time + # random slopes
-.25 * x + # time-varying covariate
.30 * clustervar1 + # cluster level covariate
rnorm(n*timepoints, mean=0, sd=sigma) # observation level error
# create data frames
d = data.frame(subject, time, clustervar1, x, y1)
# latent growth curve models require data in 'wide' format; ignore the warning
dWide = d %>%
select(-x) %>%
spread(time, y1, -clustervar1)
colnames(dWide)[-(1:2)] = paste0('y', 0:3)
# this is ugly, but I'm not interested in figuring out the 'proper' way which
# probably would not be shorter.
dWide = dWide %>%
left_join(d %>%
select(-y1) %>%
spread(time, x, -clustervar1))
colnames(dWide)[-(1:6)] = paste0('x', 0:3)
dWide %>%
mutate_if(is.numeric, round, digits=2) %>%
head()
subject clustervar1 y0 y1 y2 y3 x0 x1 x2 x3
1 1 0 -0.73 -0.12 0.11 2.26 -1.21 0.28 1.08 -2.35
2 2 0 1.31 0.14 -0.14 0.49 0.43 0.51 -0.57 -0.55
3 3 0 -0.73 1.31 1.69 3.66 -0.56 -0.89 -0.48 -1.00
4 4 0 1.90 3.07 2.88 5.50 -0.78 0.06 0.96 -0.11
5 5 0 -1.89 -1.43 -0.17 -0.19 -0.51 -0.91 -0.84 2.42
6 6 0 -0.40 0.03 0.02 1.29 0.13 -0.49 -0.44 0.46
We start with a basic random intercept model. This is notably different than the data generating process, so the estimates will be quite a bit off. In order to make the models produce the same results, the mixed model have to be estimated with maximum likelihood (not an issue with these sample sizes).
library(lme4)
mixedModel = lmer(y1 ~ time + (1|subject), data=d, REML=F) # 1 represents the intercept
summary(mixedModel, corr=F) # lessen clutter
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: y1 ~ time + (1 | subject)
Data: d
AIC BIC logLik deviance df.resid
7634.8 7657.2 -3813.4 7626.8 1996
Scaled residuals:
Min 1Q Median 3Q Max
-3.9171 -0.4884 0.0070 0.4557 3.5445
Random effects:
Groups Name Variance Std.Dev.
subject (Intercept) 2.824 1.680
Residual 1.567 1.252
Number of obs: 2000, groups: subject, 500
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.66689 0.08855 7.531
time 0.23164 0.02504 9.252
For the LGC model, I will go ahead and keep the format that will be consistent with later models, but this means we’ll have to fix some parameters so that we only have random intercepts and not slopes for the time covariate. The model also will start with the variances at each time point constrained to be equal to mimic the mixed model. We’ll relax this for both models later.
library(lavaan)
model = "
# intercept and slope with fixed coefficients
intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
time =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
# time effect not allowed to vary (and thus not covary with intercept)
time ~~ 0*time
intercept ~~ 0*time
# residual variance, constant across time (resvar)
y0 ~~ resvar*y0
y1 ~~ resvar*y1
y2 ~~ resvar*y2
y3 ~~ resvar*y3
"
growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel)
lavaan (0.5-23.1097) converged normally after 18 iterations
Number of observations 500
Estimator ML
Minimum Function Test Statistic 1737.001
Degrees of freedom 10
P-value (Chi-square) 0.000
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|)
intercept =~
y0 1.000
y1 1.000
y2 1.000
y3 1.000
time =~
y0 0.000
y1 1.000
y2 2.000
y3 3.000
Covariances:
Estimate Std.Err z-value P(>|z|)
intercept ~~
time 0.000
Intercepts:
Estimate Std.Err z-value P(>|z|)
.y0 0.000
.y1 0.000
.y2 0.000
.y3 0.000
intercept 0.667 0.089 7.531 0.000
time 0.232 0.025 9.252 0.000
Variances:
Estimate Std.Err z-value P(>|z|)
time 0.000
.y0 (rsvr) 1.567 0.057 27.386 0.000
.y1 (rsvr) 1.567 0.057 27.386 0.000
.y2 (rsvr) 1.567 0.057 27.386 0.000
.y3 (rsvr) 1.567 0.057 27.386 0.000
intrcpt 2.824 0.204 13.851 0.000
The Intercepts:
section of the growth curve model output represents the fixed effects for the intercept and time coefficient. The (resvar)
is the residual variance, while the variance for intercept
(and in subsequent models time
) is the random effects variance in the mixed model. As we can see the results are identical.
Now we add random slopes for the effect of time. I also show standardized results to make comparison of the intercept-slope correlation straightforward.
mixedModel = lmer(y1 ~ time + (1 + time|subject), data=d, REML=F)
summary(mixedModel, corr=F)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: y1 ~ time + (1 + time | subject)
Data: d
AIC BIC logLik deviance df.resid
5913.9 5947.5 -2951.0 5901.9 1994
Scaled residuals:
Min 1Q Median 3Q Max
-2.46879 -0.51536 0.00716 0.51789 3.14247
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 0.8155 0.9031
time 0.7556 0.8692 0.26
Residual 0.3079 0.5549
Number of obs: 2000, groups: subject, 500
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.66689 0.04541 14.69
time 0.23164 0.04043 5.73
This is the same graphical model we saw at the beginning.
model = "
# intercept and slope with fixed coefficients
intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
time =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
# residual variance, constant across time (resvar)
y0 ~~ resvar*y0
y1 ~~ resvar*y1
y2 ~~ resvar*y2
y3 ~~ resvar*y3
"
growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after 23 iterations
Number of observations 500
Estimator ML
Minimum Function Test Statistic 12.135
Degrees of freedom 8
P-value (Chi-square) 0.145
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept =~
y0 1.000 0.903 0.852
y1 1.000 0.903 0.596
y2 1.000 0.903 0.405
y3 1.000 0.903 0.298
time =~
y0 0.000 0.000 0.000
y1 1.000 0.869 0.574
y2 2.000 1.738 0.779
y3 3.000 2.608 0.861
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept ~~
time 0.208 0.042 4.995 0.000 0.265 0.265
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 0.000 0.000 0.000
.y1 0.000 0.000 0.000
.y2 0.000 0.000 0.000
.y3 0.000 0.000 0.000
intercept 0.667 0.045 14.686 0.000 0.738 0.738
time 0.232 0.040 5.730 0.000 0.266 0.266
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.274
.y1 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.134
.y2 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.062
.y3 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.034
intrcpt 0.815 0.066 12.372 0.000 1.000 1.000
time 0.756 0.052 14.599 0.000 1.000 1.000
The estimated effects (fixed effect + random) are identical.
coef(mixedModel)$subject %>%
bind_cols(as.tibble(lavPredict(growthCurveModel))) %>%
round(3) %>%
head(10)
(Intercept) time intercept time1
1 -0.525 0.693 -0.525 0.693
2 0.742 -0.209 0.742 -0.209
3 -0.115 1.134 -0.115 1.134
4 1.625 1.093 1.625 1.093
5 -1.246 0.347 -1.246 0.347
6 -0.217 0.363 -0.217 0.363
7 -0.700 -0.325 -0.700 -0.325
8 0.660 0.549 0.660 0.549
9 0.400 -0.038 0.400 -0.038
10 -0.395 -1.403 -0.395 -1.403
Adding a cluster level covariate1 to the mixed model is trivial. We just add it as we would any other predictor.
mixedModel = lmer(y1 ~ time + clustervar1 + (1 + time|subject), data=d, REML=F)
summary(mixedModel, corr=F)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: y1 ~ time + clustervar1 + (1 + time | subject)
Data: d
AIC BIC logLik deviance df.resid
5890.9 5930.1 -2938.4 5876.9 1993
Scaled residuals:
Min 1Q Median 3Q Max
-2.4207 -0.5055 0.0041 0.5224 3.2215
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 0.7682 0.8764
time 0.7556 0.8692 0.28
Residual 0.3079 0.5549
Number of obs: 2000, groups: subject, 500
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.44409 0.06243 7.113
time 0.23164 0.04043 5.730
clustervar1 0.44559 0.08787 5.071
For the growth curve however, we will have to first add another regression equation, and fix the path from that covariate to the slope factor to be zero, otherwise our estimates will be slightly different due to the indirect path through the random intercepts.
model = "
# intercept and slope with fixed coefficients
intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
time =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
# cluster level effect
intercept ~ clustervar1
time ~ 0*clustervar1
# residual variance
y0 ~~ resvar*y0
y1 ~~ resvar*y1
y2 ~~ resvar*y2
y3 ~~ resvar*y3
"
growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after 24 iterations
Number of observations 500
Estimator ML
Minimum Function Test Statistic 13.488
Degrees of freedom 11
P-value (Chi-square) 0.263
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept =~
y0 1.000 0.904 0.852
y1 1.000 0.904 0.595
y2 1.000 0.904 0.404
y3 1.000 0.904 0.298
time =~
y0 0.000 0.000 0.000
y1 1.000 0.869 0.572
y2 2.000 1.738 0.777
y3 3.000 2.608 0.859
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept ~
clustervar1 0.446 0.088 5.071 0.000 0.493 0.493
time ~
clustervar1 0.000 0.000 0.000
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.intercept ~~
.time 0.215 0.041 5.292 0.000 0.283 0.283
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 0.000 0.000 0.000
.y1 0.000 0.000 0.000
.y2 0.000 0.000 0.000
.y3 0.000 0.000 0.000
.intercept 0.444 0.062 7.113 0.000 0.491 0.491
.time 0.232 0.040 5.730 0.000 0.266 0.266
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.274
.y1 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.133
.y2 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.061
.y3 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.033
.intrcpt 0.768 0.063 12.202 0.000 0.939 0.939
.time 0.756 0.052 14.599 0.000 1.000 1.000
Next we’ll add an interaction between the cluster level covariate and time. For the mixed model, this is done in the same way we would for any regression model.
mixedModel = lmer(y1 ~ time*clustervar1 + (1 + time|subject), data=d, REML=F)
summary(mixedModel, corr=F)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: y1 ~ time * clustervar1 + (1 + time | subject)
Data: d
AIC BIC logLik deviance df.resid
5892.2 5937.0 -2938.1 5876.2 1992
Scaled residuals:
Min 1Q Median 3Q Max
-2.4183 -0.5029 0.0042 0.5193 3.2239
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 0.7681 0.8764
time 0.7544 0.8686 0.28
Residual 0.3079 0.5549
Number of obs: 2000, groups: subject, 500
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.44924 0.06273 7.162
time 0.26587 0.05713 4.654
clustervar1 0.43529 0.08871 4.907
time:clustervar1 -0.06846 0.08080 -0.847
In almost every depiction of growth curve models I’ve seen, the interaction of cluster level covariates and time is the default setting, unlike any other modeling situation, where it would normally be driven by theoretical or exploratory reasons. Note that the interaction was not part of the data generating process, and thus should be close to zero.
model = "
# intercept and slope with fixed coefficients
intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
time =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
# both intercept and slope latent variables 'predicted' by cluster level variable
intercept + time ~ clustervar1
# residual variance
y0 ~~ resvar*y0
y1 ~~ resvar*y1
y2 ~~ resvar*y2
y3 ~~ resvar*y3
"
growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after 26 iterations
Number of observations 500
Estimator ML
Minimum Function Test Statistic 12.771
Degrees of freedom 10
P-value (Chi-square) 0.237
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept =~
y0 1.000 0.903 0.852
y1 1.000 0.903 0.596
y2 1.000 0.903 0.405
y3 1.000 0.903 0.298
time =~
y0 0.000 0.000 0.000
y1 1.000 0.869 0.574
y2 2.000 1.738 0.779
y3 3.000 2.608 0.861
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept ~
clustervar1 0.435 0.089 4.907 0.000 0.482 0.482
time ~
clustervar1 -0.068 0.081 -0.847 0.397 -0.079 -0.079
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.intercept ~~
.time 0.215 0.041 5.292 0.000 0.283 0.283
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 0.000 0.000 0.000
.y1 0.000 0.000 0.000
.y2 0.000 0.000 0.000
.y3 0.000 0.000 0.000
.intercept 0.449 0.063 7.162 0.000 0.497 0.497
.time 0.266 0.057 4.654 0.000 0.306 0.306
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.274
.y1 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.134
.y2 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.062
.y3 (rsvr) 0.308 0.014 22.361 0.000 0.308 0.034
.intrcpt 0.768 0.063 12.202 0.000 0.942 0.942
.time 0.754 0.052 14.597 0.000 0.998 0.998
In the Regressions:
section, the intercept ~ clustervar1
path represents the fixed effect of clustervar1
in the mixed model (i.e. the cluster effect at time 0), while the time ~ clustervar1
path coefficient is the interaction effect.
As with the cluster level covariate, adding a time-varying covariate is trivial for the mixed model. For this demo I will revert back to having no interaction with the cluster level covariate. This is the data generating model, and can be compared to the parameters set in the data setup at the beginning.
mixedModel = lmer(y1 ~ time + clustervar1 + x + (1 + time|subject), data=d, REML=F)
summary(mixedModel, corr=F)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: y1 ~ time + clustervar1 + x + (1 + time | subject)
Data: d
AIC BIC logLik deviance df.resid
5611.6 5656.4 -2797.8 5595.6 1992
Scaled residuals:
Min 1Q Median 3Q Max
-2.59961 -0.51370 0.00377 0.49956 3.01415
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 0.7575 0.8703
time 0.7403 0.8604 0.30
Residual 0.2449 0.4949
Number of obs: 2000, groups: subject, 500
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.43854 0.06047 7.252
time 0.23483 0.03973 5.910
clustervar1 0.44396 0.08484 5.233
x -0.26298 0.01489 -17.663
Things become tedious for the growth curve model. Furthermore, for the latter we have to fix the effect to be constant over time (shown as B
), otherwise by default it will estimate the interaction with time.
model = "
# intercept and slope with fixed coefficients
intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
time =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
# only main effect for cluster level covariate
intercept ~ clustervar1
time ~ 0*clustervar1
# add time-varying covariate
y0 ~ B*x0
y1 ~ B*x1
y2 ~ B*x2
y3 ~ B*x3
# residual variance
y0 ~~ resvar*y0
y1 ~~ resvar*y1
y2 ~~ resvar*y2
y3 ~~ resvar*y3
"
growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after 25 iterations
Number of observations 500
Estimator ML
Minimum Function Test Statistic 21.493
Degrees of freedom 26
P-value (Chi-square) 0.716
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept =~
y0 1.000 0.898 0.846
y1 1.000 0.898 0.591
y2 1.000 0.898 0.403
y3 1.000 0.898 0.297
time =~
y0 0.000 0.000 0.000
y1 1.000 0.860 0.566
y2 2.000 1.721 0.771
y3 3.000 2.581 0.854
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept ~
clustervr1 0.444 0.085 5.233 0.000 0.494 0.494
time ~
clustervr1 0.000 0.000 0.000
y0 ~
x0 (B) -0.263 0.015 -17.663 0.000 -0.263 -0.248
y1 ~
x1 (B) -0.263 0.015 -17.663 0.000 -0.263 -0.173
y2 ~
x2 (B) -0.263 0.015 -17.663 0.000 -0.263 -0.118
y3 ~
x3 (B) -0.263 0.015 -17.663 0.000 -0.263 -0.087
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.intercept ~~
.time 0.226 0.039 5.779 0.000 0.301 0.301
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 0.000 0.000 0.000
.y1 0.000 0.000 0.000
.y2 0.000 0.000 0.000
.y3 0.000 0.000 0.000
.intercept 0.439 0.060 7.252 0.000 0.488 0.488
.time 0.235 0.040 5.910 0.000 0.273 0.273
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 (rsvr) 0.245 0.011 22.361 0.000 0.245 0.218
.y1 (rsvr) 0.245 0.011 22.361 0.000 0.245 0.106
.y2 (rsvr) 0.245 0.011 22.361 0.000 0.245 0.049
.y3 (rsvr) 0.245 0.011 22.361 0.000 0.245 0.027
.intrcpt 0.758 0.059 12.785 0.000 0.939 0.939
.time 0.740 0.050 14.816 0.000 1.000 1.000
Results are still the same.
Allowing for an interaction between the time-varying covariate and time in the LGC will only require lifting the constraint that all the y ~ x
effects be the same. However, this treats time as a categorical variable, and to run the same version as a mixed model, we thus have to treat time as both continuous and categorical (i.e. as a factor), and have the interaction regard the latter form.
To keep things clear I’ll remove the cluster level covariate from this and the LGC model.
mixedModel = lmer(y1 ~ time + x + factor(time):x + (1 + time|subject), data=d, REML=F)
summary(mixedModel, corr=F)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: y1 ~ time + x + factor(time):x + (1 + time | subject)
Data: d
AIC BIC logLik deviance df.resid
5639.6 5695.7 -2809.8 5619.6 1990
Scaled residuals:
Min 1Q Median 3Q Max
-2.60700 -0.50367 -0.00044 0.50173 2.91767
Random effects:
Groups Name Variance Std.Dev. Corr
subject (Intercept) 0.8049 0.8972
time 0.7399 0.8602 0.28
Residual 0.2444 0.4943
Number of obs: 2000, groups: subject, 500
Fixed effects:
Estimate Std. Error t value
(Intercept) 0.659944 0.044189 14.935
time 0.234799 0.039720 5.911
x -0.279234 0.030598 -9.126
x:factor(time)1 0.003116 0.040209 0.078
x:factor(time)2 0.051296 0.040484 1.267
x:factor(time)3 -0.003910 0.050177 -0.078
The growth curve with unconstrained y ~ x
paths. At this point it’s probably not too surprising why people doing growth curve models would find this difficult to understand, while those doing mixed models are still using the same interpretation as they would with standard regression output.
model = "
# intercept and slope with fixed coefficients
intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
time =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
# add time-varying covariate
y0 ~ x0
y1 ~ x1
y2 ~ x2
y3 ~ x3
# residual variance
y0 ~~ resvar*y0
y1 ~~ resvar*y1
y2 ~~ resvar*y2
y3 ~~ resvar*y3
"
growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after 30 iterations
Number of observations 500
Estimator ML
Minimum Function Test Statistic 15.207
Degrees of freedom 20
P-value (Chi-square) 0.764
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept =~
y0 1.000 0.897 0.842
y1 1.000 0.897 0.592
y2 1.000 0.897 0.404
y3 1.000 0.897 0.297
time =~
y0 0.000 0.000 0.000
y1 1.000 0.860 0.567
y2 2.000 1.720 0.774
y3 3.000 2.580 0.855
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
y0 ~
x0 -0.279 0.031 -9.126 0.000 -0.279 -0.262
y1 ~
x1 -0.276 0.027 -10.374 0.000 -0.276 -0.182
y2 ~
x2 -0.228 0.027 -8.559 0.000 -0.228 -0.103
y3 ~
x3 -0.283 0.039 -7.175 0.000 -0.283 -0.094
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept ~~
time 0.220 0.040 5.502 0.000 0.285 0.285
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 0.000 0.000 0.000
.y1 0.000 0.000 0.000
.y2 0.000 0.000 0.000
.y3 0.000 0.000 0.000
intercept 0.660 0.044 14.935 0.000 0.736 0.736
time 0.235 0.040 5.911 0.000 0.273 0.273
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 (rsvr) 0.244 0.011 22.361 0.000 0.244 0.215
.y1 (rsvr) 0.244 0.011 22.361 0.000 0.244 0.106
.y2 (rsvr) 0.244 0.011 22.361 0.000 0.244 0.049
.y3 (rsvr) 0.244 0.011 22.361 0.000 0.244 0.027
intrcpt 0.805 0.062 12.941 0.000 1.000 1.000
time 0.740 0.050 14.817 0.000 1.000 1.000
Note that in the mixed model, the interaction effects are relative to the effect at time 0. To make this more clear, I go ahead add them to that coefficient, and compare to the LGC output.
fe = fixef(mixedModel)
c(fe['x'], fe['x'] + fe[paste0('x:factor(time)', 1:3)])
x x:factor(time)1 x:factor(time)2 x:factor(time)3
-0.2792345 -0.2761180 -0.2279384 -0.2831448
coef(growthCurveModel)[1:4]
y0~x0 y1~x1 y2~x2 y3~x3
-0.2792345 -0.2761180 -0.2279384 -0.2831449
To investigate heterogeneous variances, we’ll go back to the basic random intercepts and slopes model. This will require the nlme package, as lme4 doesn’t have this functionality. The specification of the weights
argument allows a separate variance estimated at each time point.
library(nlme)
mixedModel = lme(y1 ~ time, random = ~1 + time|subject, data=d, method='ML',
weights=varIdent(form=~1|time))
summary(mixedModel)
Linear mixed-effects model fit by maximum likelihood
Data: d
AIC BIC logLik
5911.884 5962.293 -2946.942
Random effects:
Formula: ~1 + time | subject
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
(Intercept) 0.8836290 (Intr)
time 0.8580806 0.305
Residual 0.6110182
Variance function:
Structure: Different standard deviations per stratum
Formula: ~1 | time
Parameter estimates:
0 1 2 3
1.0000000 0.8934883 0.7791335 1.1048197
Fixed effects: y1 ~ time
Value Std.Error DF t-value p-value
(Intercept) 0.6669339 0.04542486 1499 14.682134 0
time 0.2275525 0.04036384 1499 5.637534 0
Correlation:
(Intr)
time 0.127
Standardized Within-Group Residuals:
Min Q1 Med Q3 Max
-2.71422728 -0.53494080 0.01744146 0.52185100 3.18290397
Number of Observations: 2000
Number of Groups: 500
VarCorr(mixedModel) # to see variances along with standard deviations
subject = pdLogChol(1 + time)
Variance StdDev Corr
(Intercept) 0.7808001 0.8836290 (Intr)
time 0.7363023 0.8580806 0.305
Residual 0.3733432 0.6110182
As we will see, the standard SEM approach estimates separate variances for each time point by default. In the SEM context, the \(\mathcal{U}\) are latent variables, and represent all causes acting on the dependent variable not otherwise specified. Technically, their paths are fixed to one and the variance of the latent variable is estimated.
model = "
# intercept and slope with fixed coefficients
intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
time =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
"
growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after 37 iterations
Number of observations 500
Estimator ML
Minimum Function Test Statistic 4.080
Degrees of freedom 5
P-value (Chi-square) 0.538
Parameter Estimates:
Information Expected
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept =~
y0 1.000 0.884 0.823
y1 1.000 0.884 0.586
y2 1.000 0.884 0.400
y3 1.000 0.884 0.291
time =~
y0 0.000 0.000 0.000
y1 1.000 0.858 0.569
y2 2.000 1.716 0.777
y3 3.000 2.574 0.846
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
intercept ~~
time 0.231 0.043 5.426 0.000 0.305 0.305
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 0.000 0.000 0.000
.y1 0.000 0.000 0.000
.y2 0.000 0.000 0.000
.y3 0.000 0.000 0.000
intercept 0.667 0.045 14.689 0.000 0.755 0.755
time 0.228 0.040 5.640 0.000 0.265 0.265
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.y0 0.373 0.047 7.987 0.000 0.373 0.323
.y1 0.298 0.026 11.450 0.000 0.298 0.131
.y2 0.227 0.033 6.936 0.000 0.227 0.046
.y3 0.456 0.075 6.086 0.000 0.456 0.049
intercept 0.781 0.068 11.400 0.000 1.000 1.000
time 0.736 0.052 14.202 0.000 1.000 1.000
Note that the residual variance at time 0 in the growth model is the reported residual variance in the mixed model. Unfortunately nlme is a bit convoluted2 in reporting the actual estimated variances, which are given not only as relative to the time 0 estimate, but also on the standard deviation scale. Even figuring out how to get these values is something no one can actually remember without multiple web searches, but it is possible.
(c(1.0000000, coef(mixedModel$modelStruct$varStruct, unconstrained=F))*mixedModel$sigma)^2 %>%
rbind(coef(growthCurveModel)[1:4])
1 2 3
. 0.3733432 0.2980478 0.2266376 0.4557127
0.3733406 0.2980483 0.2266289 0.4557313
Latent growth curve models in the standard structural equation modeling setting are mixed models that will by default:
Even when you are thinking of indirect effects, autocorrelated residual structure, etc. there are tools in R that would allow you to stay in the standard mixed model setting. The nlme package (and others) provide several options for residual and other correlation structure (e.g. spatial). For certain mediation models, the mediation package would be able to do them with mixed models. Adding latent class structure to mixed models can be done with flexmix. Bayesian approaches are no more difficult with packages like rstanarm and brms.
More complex correlational structure (e.g. nested with cross random effects, spatial) are not even possible in the SEM setting4. If you have many time points, things only get worse for the LGC approach as far as the syntax. Even having just a handful of covariates makes the LGC approach exceedingly complex.
I think it is very useful to think of random effects as latent variables. However, I find little utility in explicitly modeling them that way. Most LGC I see are overly simplistic, not including demographic and other covariates because the model quickly gets unwieldy both visually, syntactically, and cognitively. The mixed model, on the other hand, allows one to stick to the more intuitive regression interpretation, while affording all the rich parameterization of (possibly complicated) random effects models.
In case it’s not clear, this would be a time-invariant variable. If these were repeated measures on individuals, this would be something like sex or race.↩
Most of this package object structure and output is irksome to me, starting with the umpteen (random!) decimal places. I have no problem admitting this may be a character flaw on my part. Note that there is a reason for having the first variance fixed to one for estimation purposes, but I can’t think of one for reporting it in this fashion.↩
To its credit, MPlus does have some syntactical shortcuts to make things a little easier.↩
At least that I’m aware of. You can presumably incorporate hierarchical structure via as multilevel SEM, where the SEM is an LGC.↩