Latent Growth Curve

An alternative approach to mixed models considers the random effects as latent variables with the outcome at each time point an indicator for the latent variable. I have details elsewhere, but I want to explore this as it is a commonly used technique in the social sciences, especially psychology. Latent Growth Curve Models are a special case of structural equation modeling, a highly flexible tool that can incorporate latent variables, indirect effects, multiple outcomes etc. Growth curve models are actually somewhat irregular SEM in the way that they are specified, but for our purposes, we only want to see how the approach works and compare it to previous methods.

The first thing is that the data has to be in wide format, such that we have one column per time point, and thus only one row per individual. Once the data is ready we specify the model syntax. By default, the SEM approach also assumes unequal variances across time, so to make it more comparable, we fix that value to be constant. We’ll use lavaan to estimate the model.

dwide = spread(d, key=time, value=y, sep='_') %>% 
  mutate(treatment = treatment=='treatment')  # otherwise converted to numeric directly as 1-2 instead of 1-0
head(dwide)

  treatment id     time_0    time_1    time_2     time_3
1     FALSE  1  0.1760974 0.6928733 0.1920017 -0.2205356
2     FALSE  2  1.4265781 1.6607311 1.2515623  2.9685755
3     FALSE  3 -0.1383776 0.8689109 1.2446484  2.9954976
4     FALSE  4  2.0575643 1.7831405 1.6413706  1.7078853
5     FALSE  5  0.9899045 1.7892770 1.7883325  2.4579697
6     FALSE  6  0.6079163 0.1161267 0.6956824  0.8326176

growthmod_syntax = "
# model for the intercept and slope latent variables
  int   =~ 1*time_0 + 1*time_1 + 1*time_2 + 1*time_3
  slope =~ 0*time_0 + 1*time_1 + 2*time_2 + 3*time_3

# cluster-level effect
  int ~ treatment

# intercept-slope correlation
  int ~~ slope

# fix to equal variances (parameter 'res')
  time_0 ~~ res*time_0
  time_1 ~~ res*time_1
  time_2 ~~ res*time_2
  time_3 ~~ res*time_3
"

library(lavaan)
growth_mod = growth(growthmod_syntax, data=dwide)
summary(growth_mod, standardized=T)

lavaan 0.6-2 ended normally after 27 iterations

  Optimization method                           NLMINB
  Number of free parameters                         10
  Number of equality constraints                     3

  Number of observations                          2500

  Estimator                                         ML
  Model Fit Test Statistic                     234.733
  Degrees of freedom                                11
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Expected
  Information saturated (h1) model          Structured
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  int =~                                                                
    time_0            1.000                               1.048    0.896
    time_1            1.000                               1.048    0.931
    time_2            1.000                               1.048    0.863
    time_3            1.000                               1.048    0.742
  slope =~                                                              
    time_0            0.000                               0.000    0.000
    time_1            1.000                               0.394    0.350
    time_2            2.000                               0.788    0.648
    time_3            3.000                               1.182    0.836

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  int ~                                                                 
    treatment        -0.434    0.041  -10.686    0.000   -0.414   -0.207

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .int ~~                                                                
    slope            -0.128    0.011  -11.522    0.000   -0.317   -0.317

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .time_0            0.000                               0.000    0.000
   .time_1            0.000                               0.000    0.000
   .time_2            0.000                               0.000    0.000
   .time_3            0.000                               0.000    0.000
   .int               0.137    0.030    4.553    0.000    0.131    0.131
    slope             0.509    0.009   55.635    0.000    1.291    1.291

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .time_0   (res)    0.269    0.005   50.000    0.000    0.269    0.197
   .time_1   (res)    0.269    0.005   50.000    0.000    0.269    0.212
   .time_2   (res)    0.269    0.005   50.000    0.000    0.269    0.182
   .time_3   (res)    0.269    0.005   50.000    0.000    0.269    0.135
   .int               1.052    0.035   29.817    0.000    0.957    0.957
    slope             0.155    0.006   25.828    0.000    1.000    1.000

The Intercepts: section of the output shows what would be the fixed effects in the mixed model, and in this case, they are in fact ‘intercepts’ in this latent variable approach, so that is why they are named as such. The Regression of int on treatment depicts the treatment effect, and will make more sense to those who come to mixed models from the multilevel modeling literature. If you go back to the model depiction for the mixed model, this model more explicitly denotes \(\beta_{0c} = \beta_0 + \beta_2*\textrm{Treatment} + \gamma_c\). The res parameter is the arbitray name I’ve given for the residual variance, and is roughly equivalent to the square of the residual standard deviation in the mixed model output. The above model does not allow for correlated residuals, though this is possible¹⁹.

The primary point here is not to precisely reproduce the correct model but to show the identity between the mixed model and the latent growth curve approach. Proper specification will lead to identical results between latent growth curve and mixed models. The following creates a mixed model that is the equivalent.

mixed_mod_nocorr = lme(y ~ time + treatment, data=d, random=~1+time|id, method="ML")

Fixed effects: y ~ time + treatment
	Value	Std.Error	DF	t-value
(Intercept)	0.137	0.030	7499	4.552
time	0.509	0.009	7499	55.626
treatmenttreatment	-0.434	0.041	2498	-10.685

Standardized Within-Group Residuals
Min	Q1	Med	Q3	Max
-3.58	-0.51	0.01	0.5	3.28

Linear mixed-effects model fit by maximum likelihood : y ~ time + treatment
	Observations	Groups	Log-restricted-likelihood
id	10000	2500	-12732

Variance	StdDev
1.052	1.026
0.155	0.394
0.269	0.519

Pros

Can be utilized on less data than typical SEM
Very efficient estimation
Can deal with very complex models, including mediation, parallel processes etc.

Cons

Tedious to specify even the simplest of models
Very tedious to specify even common extensions (e.g. time-varying covariates)
Even worse to get into correlated residuals
More complex cluster structure is not dealt with well (if at all)²⁰
Assumes balanced time points
Doesn’t deal with many time points well (if also time-varying covariates especially)

Gist: Growth curve models are very flexible, but they are also problematic simply because they are from the SEM world, which is one where models are notoriously misapplied. Furthermore, there are no common uses of growth curve models that would not be more easily implemented in one of several R packages²¹ and various other languages and statistical programs. While I find the latent variable interpretation very much intriguing, the latent variable approach is not something I’d normally consider for this setting.

See my LGC chapter in this SEM document. Once you see it there you’ll know why I did not do so here.↩
MPlus has recently incorporated the ability to handle crossed random effects (and see example 9.24 in the version 7 manual), but I have no idea how they work in realistic situations with potentially many, possibly time-varying, covariates, and it’s actually done with their multilevel approach rather than the LGC we’ve been discussing. Furthermore, it requires the Bayesian estimator, which, if you’re going that route you might as well use rstan, rjags or similar and have a lot more utility (and clarity) at your disposal. For tools like lme4 and similar, incorporating crossed random effects are no more difficult than other situations, i.e. are 1 line of code, while you’d be debugging the MPlus output for days.↩
See the mediation package for mediation with mixed models, flexMix for growth mixture models, Bayesian approaches for parallel processes etc.↩