Latent Growth Curve
An alternative approach to mixed models considers the random effects as latent variables with the outcome at each time point an indicator for the latent variable. I have details elsewhere, but I want to explore this as it is a commonly used technique in the social sciences, especially psychology. Latent Growth Curve Models are a special case of structural equation modeling, a highly flexible tool that can incorporate latent variables, indirect effects, multiple outcomes etc. Growth curve models are actually somewhat irregular SEM in the way that they are specified, but for our purposes, we only want to see how the approach works and compare it to previous methods.
The first thing is that the data has to be in wide format, such that we have one column per time point, and thus only one row per individual. Once the data is ready we specify the model syntax. By default, the SEM approach also assumes unequal variances across time, so to make it more comparable, we fix that value to be constant. We’ll use lavaan to estimate the model.
dwide = spread(d, key=time, value=y, sep='_') %>%
mutate(treatment = treatment=='treatment') # otherwise converted to numeric directly as 1-2 instead of 1-0
head(dwide)
treatment id time_0 time_1 time_2 time_3
1 FALSE 1 0.1760974 0.6928733 0.1920017 -0.2205356
2 FALSE 2 1.4265781 1.6607311 1.2515623 2.9685755
3 FALSE 3 -0.1383776 0.8689109 1.2446484 2.9954976
4 FALSE 4 2.0575643 1.7831405 1.6413706 1.7078853
5 FALSE 5 0.9899045 1.7892770 1.7883325 2.4579697
6 FALSE 6 0.6079163 0.1161267 0.6956824 0.8326176
growthmod_syntax = "
# model for the intercept and slope latent variables
int =~ 1*time_0 + 1*time_1 + 1*time_2 + 1*time_3
slope =~ 0*time_0 + 1*time_1 + 2*time_2 + 3*time_3
# cluster-level effect
int ~ treatment
# intercept-slope correlation
int ~~ slope
# fix to equal variances (parameter 'res')
time_0 ~~ res*time_0
time_1 ~~ res*time_1
time_2 ~~ res*time_2
time_3 ~~ res*time_3
"
library(lavaan)
growth_mod = growth(growthmod_syntax, data=dwide)
summary(growth_mod, standardized=T)
lavaan 0.6-2 ended normally after 27 iterations
Optimization method NLMINB
Number of free parameters 10
Number of equality constraints 3
Number of observations 2500
Estimator ML
Model Fit Test Statistic 234.733
Degrees of freedom 11
P-value (Chi-square) 0.000
Parameter Estimates:
Information Expected
Information saturated (h1) model Structured
Standard Errors Standard
Latent Variables:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
int =~
time_0 1.000 1.048 0.896
time_1 1.000 1.048 0.931
time_2 1.000 1.048 0.863
time_3 1.000 1.048 0.742
slope =~
time_0 0.000 0.000 0.000
time_1 1.000 0.394 0.350
time_2 2.000 0.788 0.648
time_3 3.000 1.182 0.836
Regressions:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
int ~
treatment -0.434 0.041 -10.686 0.000 -0.414 -0.207
Covariances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.int ~~
slope -0.128 0.011 -11.522 0.000 -0.317 -0.317
Intercepts:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.time_0 0.000 0.000 0.000
.time_1 0.000 0.000 0.000
.time_2 0.000 0.000 0.000
.time_3 0.000 0.000 0.000
.int 0.137 0.030 4.553 0.000 0.131 0.131
slope 0.509 0.009 55.635 0.000 1.291 1.291
Variances:
Estimate Std.Err z-value P(>|z|) Std.lv Std.all
.time_0 (res) 0.269 0.005 50.000 0.000 0.269 0.197
.time_1 (res) 0.269 0.005 50.000 0.000 0.269 0.212
.time_2 (res) 0.269 0.005 50.000 0.000 0.269 0.182
.time_3 (res) 0.269 0.005 50.000 0.000 0.269 0.135
.int 1.052 0.035 29.817 0.000 0.957 0.957
slope 0.155 0.006 25.828 0.000 1.000 1.000
The Intercepts:
section of the output shows what would be the fixed effects in the mixed model, and in this case, they are in fact ‘intercepts’ in this latent variable approach, so that is why they are named as such. The Regression of int
on treatment depicts the treatment effect, and will make more sense to those who come to mixed models from the multilevel modeling literature. If you go back to the model depiction for the mixed model, this model more explicitly denotes \(\beta_{0c} = \beta_0 + \beta_2*\textrm{Treatment} + \gamma_c\). The res
parameter is the arbitray name I’ve given for the residual variance, and is roughly equivalent to the square of the residual standard deviation in the mixed model output. The above model does not allow for correlated residuals, though this is possible19.
The primary point here is not to precisely reproduce the correct model but to show the identity between the mixed model and the latent growth curve approach. Proper specification will lead to identical results between latent growth curve and mixed models. The following creates a mixed model that is the equivalent.
Value | Std.Error | DF | t-value | p-value | |
---|---|---|---|---|---|
(Intercept) | 0.137 | 0.030 | 7499 | 4.552 | 0 |
time | 0.509 | 0.009 | 7499 | 55.626 | 0 |
treatmenttreatment | -0.434 | 0.041 | 2498 | -10.685 | 0 |
Min | Q1 | Med | Q3 | Max |
---|---|---|---|---|
-3.58 | -0.51 | 0.01 | 0.5 | 3.28 |
Observations | Groups | Log-restricted-likelihood | |
---|---|---|---|
id | 10000 | 2500 | -12732 |
Variance | StdDev |
---|---|
1.052 | 1.026 |
0.155 | 0.394 |
0.269 | 0.519 |
Pros
- Can be utilized on less data than typical SEM
- Very efficient estimation
- Can deal with very complex models, including mediation, parallel processes etc.
Cons
- Tedious to specify even the simplest of models
- Very tedious to specify even common extensions (e.g. time-varying covariates)
- Even worse to get into correlated residuals
- More complex cluster structure is not dealt with well (if at all)20
- Assumes balanced time points
- Doesn’t deal with many time points well (if also time-varying covariates especially)
Gist: Growth curve models are very flexible, but they are also problematic simply because they are from the SEM world, which is one where models are notoriously misapplied. Furthermore, there are no common uses of growth curve models that would not be more easily implemented in one of several R packages21 and various other languages and statistical programs. While I find the latent variable interpretation very much intriguing, the latent variable approach is not something I’d normally consider for this setting.
See my LGC chapter in this SEM document. Once you see it there you’ll know why I did not do so here.↩
MPlus has recently incorporated the ability to handle crossed random effects (and see example 9.24 in the version 7 manual), but I have no idea how they work in realistic situations with potentially many, possibly time-varying, covariates, and it’s actually done with their multilevel approach rather than the LGC we’ve been discussing. Furthermore, it requires the Bayesian estimator, which, if you’re going that route you might as well use rstan, rjags or similar and have a lot more utility (and clarity) at your disposal. For tools like lme4 and similar, incorporating crossed random effects are no more difficult than other situations, i.e. are 1 line of code, while you’d be debugging the MPlus output for days.↩
See the mediation package for mediation with mixed models, flexMix for growth mixture models, Bayesian approaches for parallel processes etc.↩