Introduction

Mixed models and latent growth curves (LGC) are common approaches to dealing with longitudinal/repeated measures data. I have demonstrated their equivalence elsewhere (e.g. here), but this document hits on some additional nuances. The only model packages needed here will be lme4, nlme, and lavaan.

I will not be explaining these models in detail; this document is for those already somewhat familiar with them. However, the key idea is that the random effects in the mixed model are the latent variables in the growth curve model.

In what follows we’ll examine data where clusters have multiple observations across time, and eventually other covariates will be added to the mix that are time-varying or time-constant. The following can serve as a quick reminder of what we’re dealing with. The following depicts a mixed model with random intercepts and random slopes for the time covariate for observations \(i\) in cluster \(c\).

\[y_{ic} = \alpha_c + \beta_c \cdot \mathrm{time}_{ic} + \epsilon_{ic}\]

\[\begin{bmatrix}\alpha_c \\ \beta_c \end{bmatrix} \sim \mathscr{N}(\mathbf{\mu}, \mathbf{\Sigma}) \]

\[ \mathbf{\mu} = \begin{bmatrix}\mu_\alpha \\ \mu_\beta \end{bmatrix}\] \[\mathbf{\Sigma} = \left[\begin{smallmatrix}\tau^2 & \rho \\ \rho & \varphi^2 \end{smallmatrix}\right]\]

\[\epsilon_{ic} \sim \mathscr{N}(0, \varsigma^2)\] In the above we have cluster-specific coefficients, \(\alpha_c\) and \(\beta_c\), drawn from a multivariate normal distribution with means, \(\mu_\alpha\) and \(\mu_\beta\), the soi-disant ‘fixed effects’. Each has their respective variance (\(\tau^2\) and \(\varphi^2\)), and some covariance (\(\rho\)), to go along with the residual variance at the observation level (\(\varsigma^2\)).

The same model is depicted graphically as a growth curve model in the structural equation modeling (SEM) context. I show this and subsequent graphs sans variances, but you can assume there is a little latent variable representing ‘all other causes’ pointing at the latent variables and endogenous variables (i.e. the Ys).

Data setup

First things first, we need some data. Note that one could use lavaan to simulate the data as well if desired, but I think the following is clearer as it is more in line with standard regression.

library(tidyverse)
set.seed(1234)

# data size
n = 500
timepoints = 4
N = n*timepoints

# covariates
time = rep(0:3, times=n)
x = rnorm(N)
clustervar1 = rep(0:1, e=N/2)
subject = rep(1:n, each=4)

# fixed effects
intercept = .5
slope = .25

# random effects and observation level error
randomEffectsCorr = matrix(c(.75,.2,.2,.75), ncol=2) 
randomEffects = MASS::mvrnorm(n, mu=c(0,0), Sigma = randomEffectsCorr, empirical=T) %>% 
  data.frame()
colnames(randomEffects) = c('Int', 'Slope')
sigma = .5

# target variable
y1 = 
  (intercept + randomEffects$Int[subject]) +          # random intercepts
  (slope + randomEffects$Slope[subject]) * time +     # random slopes 
  -.25 * x +                                          # time-varying covariate
   .30 * clustervar1 +                                # cluster level covariate
  rnorm(n*timepoints, mean=0, sd=sigma)               # observation level error


# create data frames
d = data.frame(subject, time, clustervar1, x, y1)


# latent growth curve models require data in 'wide' format; ignore the warning
dWide = d %>%  
  select(-x) %>% 
  spread(time, y1, -clustervar1)

colnames(dWide)[-(1:2)] = paste0('y', 0:3)

# this is ugly, but I'm not interested in figuring out the 'proper' way which
# probably would not be shorter.
dWide = dWide %>%  
  left_join(d %>%
              select(-y1) %>% 
              spread(time, x, -clustervar1))
colnames(dWide)[-(1:6)] = paste0('x', 0:3)
dWide %>% 
  mutate_if(is.numeric, round, digits=2) %>% 
  head()
  subject clustervar1    y0    y1    y2    y3    x0    x1    x2    x3
1       1           0 -0.73 -0.12  0.11  2.26 -1.21  0.28  1.08 -2.35
2       2           0  1.31  0.14 -0.14  0.49  0.43  0.51 -0.57 -0.55
3       3           0 -0.73  1.31  1.69  3.66 -0.56 -0.89 -0.48 -1.00
4       4           0  1.90  3.07  2.88  5.50 -0.78  0.06  0.96 -0.11
5       5           0 -1.89 -1.43 -0.17 -0.19 -0.51 -0.91 -0.84  2.42
6       6           0 -0.40  0.03  0.02  1.29  0.13 -0.49 -0.44  0.46

Random intercepts

We start with a basic random intercept model. This is notably different than the data generating process, so the estimates will be quite a bit off. In order to make the models produce the same results, the mixed model have to be estimated with maximum likelihood (not an issue with these sample sizes).

Mixed model

library(lme4)
mixedModel = lmer(y1 ~ time + (1|subject), data=d, REML=F)  # 1 represents the intercept
summary(mixedModel, corr=F)  # lessen clutter
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y1 ~ time + (1 | subject)
   Data: d

     AIC      BIC   logLik deviance df.resid 
  7634.8   7657.2  -3813.4   7626.8     1996 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9171 -0.4884  0.0070  0.4557  3.5445 

Random effects:
 Groups   Name        Variance Std.Dev.
 subject  (Intercept) 2.824    1.680   
 Residual             1.567    1.252   
Number of obs: 2000, groups:  subject, 500

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.66689    0.08855   7.531
time         0.23164    0.02504   9.252

LGC

For the LGC model, I will go ahead and keep the format that will be consistent with later models, but this means we’ll have to fix some parameters so that we only have random intercepts and not slopes for the time covariate. The model also will start with the variances at each time point constrained to be equal to mimic the mixed model. We’ll relax this for both models later.

library(lavaan)
model = "
  # intercept and slope with fixed coefficients
  intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
  time      =~ 0*y0 + 1*y1 + 2*y2 + 3*y3

  # time effect not allowed to vary (and thus not covary with intercept)
  time ~~ 0*time
  intercept ~~ 0*time

  # residual variance, constant across time (resvar)
  y0 ~~ resvar*y0
  y1 ~~ resvar*y1
  y2 ~~ resvar*y2
  y3 ~~ resvar*y3
"

growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel)
lavaan (0.5-23.1097) converged normally after  18 iterations

  Number of observations                           500

  Estimator                                         ML
  Minimum Function Test Statistic             1737.001
  Degrees of freedom                                10
  P-value (Chi-square)                           0.000

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept =~                                        
    y0                1.000                           
    y1                1.000                           
    y2                1.000                           
    y3                1.000                           
  time =~                                             
    y0                0.000                           
    y1                1.000                           
    y2                2.000                           
    y3                3.000                           

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)
  intercept ~~                                        
    time              0.000                           

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)
   .y0                0.000                           
   .y1                0.000                           
   .y2                0.000                           
   .y3                0.000                           
    intercept         0.667    0.089    7.531    0.000
    time              0.232    0.025    9.252    0.000

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)
    time              0.000                           
   .y0      (rsvr)    1.567    0.057   27.386    0.000
   .y1      (rsvr)    1.567    0.057   27.386    0.000
   .y2      (rsvr)    1.567    0.057   27.386    0.000
   .y3      (rsvr)    1.567    0.057   27.386    0.000
    intrcpt           2.824    0.204   13.851    0.000

The Intercepts: section of the growth curve model output represents the fixed effects for the intercept and time coefficient. The (resvar) is the residual variance, while the variance for intercept (and in subsequent models time) is the random effects variance in the mixed model. As we can see the results are identical.

Random intercepts and slopes

Mixed model

Now we add random slopes for the effect of time. I also show standardized results to make comparison of the intercept-slope correlation straightforward.

mixedModel = lmer(y1 ~ time + (1 + time|subject), data=d, REML=F) 
summary(mixedModel, corr=F) 
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y1 ~ time + (1 + time | subject)
   Data: d

     AIC      BIC   logLik deviance df.resid 
  5913.9   5947.5  -2951.0   5901.9     1994 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.46879 -0.51536  0.00716  0.51789  3.14247 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 subject  (Intercept) 0.8155   0.9031       
          time        0.7556   0.8692   0.26
 Residual             0.3079   0.5549       
Number of obs: 2000, groups:  subject, 500

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.66689    0.04541   14.69
time         0.23164    0.04043    5.73

LGC

This is the same graphical model we saw at the beginning.

model = "
  # intercept and slope with fixed coefficients
  intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
  time      =~ 0*y0 + 1*y1 + 2*y2 + 3*y3

  # residual variance, constant across time (resvar)
  y0 ~~ resvar*y0
  y1 ~~ resvar*y1
  y2 ~~ resvar*y2
  y3 ~~ resvar*y3
"

growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after  23 iterations

  Number of observations                           500

  Estimator                                         ML
  Minimum Function Test Statistic               12.135
  Degrees of freedom                                 8
  P-value (Chi-square)                           0.145

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept =~                                                          
    y0                1.000                               0.903    0.852
    y1                1.000                               0.903    0.596
    y2                1.000                               0.903    0.405
    y3                1.000                               0.903    0.298
  time =~                                                               
    y0                0.000                               0.000    0.000
    y1                1.000                               0.869    0.574
    y2                2.000                               1.738    0.779
    y3                3.000                               2.608    0.861

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept ~~                                                          
    time              0.208    0.042    4.995    0.000    0.265    0.265

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0                0.000                               0.000    0.000
   .y1                0.000                               0.000    0.000
   .y2                0.000                               0.000    0.000
   .y3                0.000                               0.000    0.000
    intercept         0.667    0.045   14.686    0.000    0.738    0.738
    time              0.232    0.040    5.730    0.000    0.266    0.266

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.274
   .y1      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.134
   .y2      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.062
   .y3      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.034
    intrcpt           0.815    0.066   12.372    0.000    1.000    1.000
    time              0.756    0.052   14.599    0.000    1.000    1.000

The estimated effects (fixed effect + random) are identical.

coef(mixedModel)$subject %>% 
  bind_cols(as.tibble(lavPredict(growthCurveModel))) %>% 
  round(3) %>% 
  head(10)
   (Intercept)   time intercept  time1
1       -0.525  0.693    -0.525  0.693
2        0.742 -0.209     0.742 -0.209
3       -0.115  1.134    -0.115  1.134
4        1.625  1.093     1.625  1.093
5       -1.246  0.347    -1.246  0.347
6       -0.217  0.363    -0.217  0.363
7       -0.700 -0.325    -0.700 -0.325
8        0.660  0.549     0.660  0.549
9        0.400 -0.038     0.400 -0.038
10      -0.395 -1.403    -0.395 -1.403

Add a cluster level covariate

Mixed model

Adding a cluster level covariate1 to the mixed model is trivial. We just add it as we would any other predictor.

mixedModel = lmer(y1 ~ time + clustervar1 + (1 + time|subject), data=d, REML=F)
summary(mixedModel, corr=F)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y1 ~ time + clustervar1 + (1 + time | subject)
   Data: d

     AIC      BIC   logLik deviance df.resid 
  5890.9   5930.1  -2938.4   5876.9     1993 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4207 -0.5055  0.0041  0.5224  3.2215 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 subject  (Intercept) 0.7682   0.8764       
          time        0.7556   0.8692   0.28
 Residual             0.3079   0.5549       
Number of obs: 2000, groups:  subject, 500

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.44409    0.06243   7.113
time         0.23164    0.04043   5.730
clustervar1  0.44559    0.08787   5.071

LGC

For the growth curve however, we will have to first add another regression equation, and fix the path from that covariate to the slope factor to be zero, otherwise our estimates will be slightly different due to the indirect path through the random intercepts.

model = "
  # intercept and slope with fixed coefficients
  intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
  time      =~ 0*y0 + 1*y1 + 2*y2 + 3*y3

  # cluster level effect
  intercept ~ clustervar1
  time ~ 0*clustervar1

  # residual variance
  y0 ~~ resvar*y0
  y1 ~~ resvar*y1
  y2 ~~ resvar*y2
  y3 ~~ resvar*y3
"

growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after  24 iterations

  Number of observations                           500

  Estimator                                         ML
  Minimum Function Test Statistic               13.488
  Degrees of freedom                                11
  P-value (Chi-square)                           0.263

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept =~                                                          
    y0                1.000                               0.904    0.852
    y1                1.000                               0.904    0.595
    y2                1.000                               0.904    0.404
    y3                1.000                               0.904    0.298
  time =~                                                               
    y0                0.000                               0.000    0.000
    y1                1.000                               0.869    0.572
    y2                2.000                               1.738    0.777
    y3                3.000                               2.608    0.859

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept ~                                                           
    clustervar1       0.446    0.088    5.071    0.000    0.493    0.493
  time ~                                                                
    clustervar1       0.000                               0.000    0.000

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .intercept ~~                                                          
   .time              0.215    0.041    5.292    0.000    0.283    0.283

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0                0.000                               0.000    0.000
   .y1                0.000                               0.000    0.000
   .y2                0.000                               0.000    0.000
   .y3                0.000                               0.000    0.000
   .intercept         0.444    0.062    7.113    0.000    0.491    0.491
   .time              0.232    0.040    5.730    0.000    0.266    0.266

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.274
   .y1      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.133
   .y2      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.061
   .y3      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.033
   .intrcpt           0.768    0.063   12.202    0.000    0.939    0.939
   .time              0.756    0.052   14.599    0.000    1.000    1.000

Add an interaction

Mixed model

Next we’ll add an interaction between the cluster level covariate and time. For the mixed model, this is done in the same way we would for any regression model.

mixedModel = lmer(y1 ~ time*clustervar1 + (1 + time|subject), data=d, REML=F)  
summary(mixedModel, corr=F)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y1 ~ time * clustervar1 + (1 + time | subject)
   Data: d

     AIC      BIC   logLik deviance df.resid 
  5892.2   5937.0  -2938.1   5876.2     1992 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4183 -0.5029  0.0042  0.5193  3.2239 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 subject  (Intercept) 0.7681   0.8764       
          time        0.7544   0.8686   0.28
 Residual             0.3079   0.5549       
Number of obs: 2000, groups:  subject, 500

Fixed effects:
                 Estimate Std. Error t value
(Intercept)       0.44924    0.06273   7.162
time              0.26587    0.05713   4.654
clustervar1       0.43529    0.08871   4.907
time:clustervar1 -0.06846    0.08080  -0.847

LGC

In almost every depiction of growth curve models I’ve seen, the interaction of cluster level covariates and time is the default setting, unlike any other modeling situation, where it would normally be driven by theoretical or exploratory reasons. Note that the interaction was not part of the data generating process, and thus should be close to zero.

model = "
  # intercept and slope with fixed coefficients
  intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
  time      =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
  
  # both intercept and slope latent variables 'predicted' by cluster level variable
  intercept + time ~ clustervar1
  
  # residual variance
  y0 ~~ resvar*y0
  y1 ~~ resvar*y1
  y2 ~~ resvar*y2
  y3 ~~ resvar*y3
"

growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after  26 iterations

  Number of observations                           500

  Estimator                                         ML
  Minimum Function Test Statistic               12.771
  Degrees of freedom                                10
  P-value (Chi-square)                           0.237

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept =~                                                          
    y0                1.000                               0.903    0.852
    y1                1.000                               0.903    0.596
    y2                1.000                               0.903    0.405
    y3                1.000                               0.903    0.298
  time =~                                                               
    y0                0.000                               0.000    0.000
    y1                1.000                               0.869    0.574
    y2                2.000                               1.738    0.779
    y3                3.000                               2.608    0.861

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept ~                                                           
    clustervar1       0.435    0.089    4.907    0.000    0.482    0.482
  time ~                                                                
    clustervar1      -0.068    0.081   -0.847    0.397   -0.079   -0.079

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .intercept ~~                                                          
   .time              0.215    0.041    5.292    0.000    0.283    0.283

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0                0.000                               0.000    0.000
   .y1                0.000                               0.000    0.000
   .y2                0.000                               0.000    0.000
   .y3                0.000                               0.000    0.000
   .intercept         0.449    0.063    7.162    0.000    0.497    0.497
   .time              0.266    0.057    4.654    0.000    0.306    0.306

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.274
   .y1      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.134
   .y2      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.062
   .y3      (rsvr)    0.308    0.014   22.361    0.000    0.308    0.034
   .intrcpt           0.768    0.063   12.202    0.000    0.942    0.942
   .time              0.754    0.052   14.597    0.000    0.998    0.998

In the Regressions: section, the intercept ~ clustervar1 path represents the fixed effect of clustervar1 in the mixed model (i.e. the cluster effect at time 0), while the time ~ clustervar1 path coefficient is the interaction effect.

Add a time-varying covariate

Mixed model

As with the cluster level covariate, adding a time-varying covariate is trivial for the mixed model. For this demo I will revert back to having no interaction with the cluster level covariate. This is the data generating model, and can be compared to the parameters set in the data setup at the beginning.

mixedModel = lmer(y1 ~ time + clustervar1 + x + (1 + time|subject), data=d, REML=F)  
summary(mixedModel, corr=F)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y1 ~ time + clustervar1 + x + (1 + time | subject)
   Data: d

     AIC      BIC   logLik deviance df.resid 
  5611.6   5656.4  -2797.8   5595.6     1992 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.59961 -0.51370  0.00377  0.49956  3.01415 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 subject  (Intercept) 0.7575   0.8703       
          time        0.7403   0.8604   0.30
 Residual             0.2449   0.4949       
Number of obs: 2000, groups:  subject, 500

Fixed effects:
            Estimate Std. Error t value
(Intercept)  0.43854    0.06047   7.252
time         0.23483    0.03973   5.910
clustervar1  0.44396    0.08484   5.233
x           -0.26298    0.01489 -17.663

LGC

Things become tedious for the growth curve model. Furthermore, for the latter we have to fix the effect to be constant over time (shown as B), otherwise by default it will estimate the interaction with time.

model = "
  # intercept and slope with fixed coefficients
  intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
  time      =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
  
  # only main effect for cluster level covariate
  intercept ~ clustervar1
  time ~ 0*clustervar1

  # add time-varying covariate 
  y0 ~ B*x0
  y1 ~ B*x1
  y2 ~ B*x2
  y3 ~ B*x3

  # residual variance
  y0 ~~ resvar*y0
  y1 ~~ resvar*y1
  y2 ~~ resvar*y2
  y3 ~~ resvar*y3
"

growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after  25 iterations

  Number of observations                           500

  Estimator                                         ML
  Minimum Function Test Statistic               21.493
  Degrees of freedom                                26
  P-value (Chi-square)                           0.716

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept =~                                                          
    y0                1.000                               0.898    0.846
    y1                1.000                               0.898    0.591
    y2                1.000                               0.898    0.403
    y3                1.000                               0.898    0.297
  time =~                                                               
    y0                0.000                               0.000    0.000
    y1                1.000                               0.860    0.566
    y2                2.000                               1.721    0.771
    y3                3.000                               2.581    0.854

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept ~                                                           
    clustervr1        0.444    0.085    5.233    0.000    0.494    0.494
  time ~                                                                
    clustervr1        0.000                               0.000    0.000
  y0 ~                                                                  
    x0         (B)   -0.263    0.015  -17.663    0.000   -0.263   -0.248
  y1 ~                                                                  
    x1         (B)   -0.263    0.015  -17.663    0.000   -0.263   -0.173
  y2 ~                                                                  
    x2         (B)   -0.263    0.015  -17.663    0.000   -0.263   -0.118
  y3 ~                                                                  
    x3         (B)   -0.263    0.015  -17.663    0.000   -0.263   -0.087

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
 .intercept ~~                                                          
   .time              0.226    0.039    5.779    0.000    0.301    0.301

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0                0.000                               0.000    0.000
   .y1                0.000                               0.000    0.000
   .y2                0.000                               0.000    0.000
   .y3                0.000                               0.000    0.000
   .intercept         0.439    0.060    7.252    0.000    0.488    0.488
   .time              0.235    0.040    5.910    0.000    0.273    0.273

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0      (rsvr)    0.245    0.011   22.361    0.000    0.245    0.218
   .y1      (rsvr)    0.245    0.011   22.361    0.000    0.245    0.106
   .y2      (rsvr)    0.245    0.011   22.361    0.000    0.245    0.049
   .y3      (rsvr)    0.245    0.011   22.361    0.000    0.245    0.027
   .intrcpt           0.758    0.059   12.785    0.000    0.939    0.939
   .time              0.740    0.050   14.816    0.000    1.000    1.000

Results are still the same.

Add an interaction

Allowing for an interaction between the time-varying covariate and time in the LGC will only require lifting the constraint that all the y ~ x effects be the same. However, this treats time as a categorical variable, and to run the same version as a mixed model, we thus have to treat time as both continuous and categorical (i.e. as a factor), and have the interaction regard the latter form.

Mixed model

To keep things clear I’ll remove the cluster level covariate from this and the LGC model.

mixedModel = lmer(y1 ~ time + x + factor(time):x + (1 + time|subject), data=d, REML=F)  
summary(mixedModel, corr=F)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y1 ~ time + x + factor(time):x + (1 + time | subject)
   Data: d

     AIC      BIC   logLik deviance df.resid 
  5639.6   5695.7  -2809.8   5619.6     1990 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-2.60700 -0.50367 -0.00044  0.50173  2.91767 

Random effects:
 Groups   Name        Variance Std.Dev. Corr
 subject  (Intercept) 0.8049   0.8972       
          time        0.7399   0.8602   0.28
 Residual             0.2444   0.4943       
Number of obs: 2000, groups:  subject, 500

Fixed effects:
                 Estimate Std. Error t value
(Intercept)      0.659944   0.044189  14.935
time             0.234799   0.039720   5.911
x               -0.279234   0.030598  -9.126
x:factor(time)1  0.003116   0.040209   0.078
x:factor(time)2  0.051296   0.040484   1.267
x:factor(time)3 -0.003910   0.050177  -0.078

LGC

The growth curve with unconstrained y ~ x paths. At this point it’s probably not too surprising why people doing growth curve models would find this difficult to understand, while those doing mixed models are still using the same interpretation as they would with standard regression output.

model = "
  # intercept and slope with fixed coefficients
  intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
  time      =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
  
  # add time-varying covariate 
  y0 ~ x0
  y1 ~ x1
  y2 ~ x2
  y3 ~ x3

  # residual variance
  y0 ~~ resvar*y0
  y1 ~~ resvar*y1
  y2 ~~ resvar*y2
  y3 ~~ resvar*y3
"

growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after  30 iterations

  Number of observations                           500

  Estimator                                         ML
  Minimum Function Test Statistic               15.207
  Degrees of freedom                                20
  P-value (Chi-square)                           0.764

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept =~                                                          
    y0                1.000                               0.897    0.842
    y1                1.000                               0.897    0.592
    y2                1.000                               0.897    0.404
    y3                1.000                               0.897    0.297
  time =~                                                               
    y0                0.000                               0.000    0.000
    y1                1.000                               0.860    0.567
    y2                2.000                               1.720    0.774
    y3                3.000                               2.580    0.855

Regressions:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  y0 ~                                                                  
    x0               -0.279    0.031   -9.126    0.000   -0.279   -0.262
  y1 ~                                                                  
    x1               -0.276    0.027  -10.374    0.000   -0.276   -0.182
  y2 ~                                                                  
    x2               -0.228    0.027   -8.559    0.000   -0.228   -0.103
  y3 ~                                                                  
    x3               -0.283    0.039   -7.175    0.000   -0.283   -0.094

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept ~~                                                          
    time              0.220    0.040    5.502    0.000    0.285    0.285

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0                0.000                               0.000    0.000
   .y1                0.000                               0.000    0.000
   .y2                0.000                               0.000    0.000
   .y3                0.000                               0.000    0.000
    intercept         0.660    0.044   14.935    0.000    0.736    0.736
    time              0.235    0.040    5.911    0.000    0.273    0.273

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0      (rsvr)    0.244    0.011   22.361    0.000    0.244    0.215
   .y1      (rsvr)    0.244    0.011   22.361    0.000    0.244    0.106
   .y2      (rsvr)    0.244    0.011   22.361    0.000    0.244    0.049
   .y3      (rsvr)    0.244    0.011   22.361    0.000    0.244    0.027
    intrcpt           0.805    0.062   12.941    0.000    1.000    1.000
    time              0.740    0.050   14.817    0.000    1.000    1.000

Note that in the mixed model, the interaction effects are relative to the effect at time 0. To make this more clear, I go ahead add them to that coefficient, and compare to the LGC output.

fe = fixef(mixedModel)
c(fe['x'], fe['x'] + fe[paste0('x:factor(time)', 1:3)])
              x x:factor(time)1 x:factor(time)2 x:factor(time)3 
     -0.2792345      -0.2761180      -0.2279384      -0.2831448 
coef(growthCurveModel)[1:4]
     y0~x0      y1~x1      y2~x2      y3~x3 
-0.2792345 -0.2761180 -0.2279384 -0.2831449 

Heterogenous variances

Mixed model

To investigate heterogeneous variances, we’ll go back to the basic random intercepts and slopes model. This will require the nlme package, as lme4 doesn’t have this functionality. The specification of the weights argument allows a separate variance estimated at each time point.

library(nlme)
mixedModel = lme(y1 ~ time, random = ~1 + time|subject, data=d, method='ML',
                 weights=varIdent(form=~1|time))  
summary(mixedModel)
Linear mixed-effects model fit by maximum likelihood
 Data: d 
       AIC      BIC    logLik
  5911.884 5962.293 -2946.942

Random effects:
 Formula: ~1 + time | subject
 Structure: General positive-definite, Log-Cholesky parametrization
            StdDev    Corr  
(Intercept) 0.8836290 (Intr)
time        0.8580806 0.305 
Residual    0.6110182       

Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | time 
 Parameter estimates:
        0         1         2         3 
1.0000000 0.8934883 0.7791335 1.1048197 
Fixed effects: y1 ~ time 
                Value  Std.Error   DF   t-value p-value
(Intercept) 0.6669339 0.04542486 1499 14.682134       0
time        0.2275525 0.04036384 1499  5.637534       0
 Correlation: 
     (Intr)
time 0.127 

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.71422728 -0.53494080  0.01744146  0.52185100  3.18290397 

Number of Observations: 2000
Number of Groups: 500 
VarCorr(mixedModel) # to see variances along with standard deviations
subject = pdLogChol(1 + time) 
            Variance  StdDev    Corr  
(Intercept) 0.7808001 0.8836290 (Intr)
time        0.7363023 0.8580806 0.305 
Residual    0.3733432 0.6110182       

LGC

As we will see, the standard SEM approach estimates separate variances for each time point by default. In the SEM context, the \(\mathcal{U}\) are latent variables, and represent all causes acting on the dependent variable not otherwise specified. Technically, their paths are fixed to one and the variance of the latent variable is estimated.

model = "
  # intercept and slope with fixed coefficients
  intercept =~ 1*y0 + 1*y1 + 1*y2 + 1*y3
  time      =~ 0*y0 + 1*y1 + 2*y2 + 3*y3
"

growthCurveModel = growth(model, data=dWide)
summary(growthCurveModel, std=T)
lavaan (0.5-23.1097) converged normally after  37 iterations

  Number of observations                           500

  Estimator                                         ML
  Minimum Function Test Statistic                4.080
  Degrees of freedom                                 5
  P-value (Chi-square)                           0.538

Parameter Estimates:

  Information                                 Expected
  Standard Errors                             Standard

Latent Variables:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept =~                                                          
    y0                1.000                               0.884    0.823
    y1                1.000                               0.884    0.586
    y2                1.000                               0.884    0.400
    y3                1.000                               0.884    0.291
  time =~                                                               
    y0                0.000                               0.000    0.000
    y1                1.000                               0.858    0.569
    y2                2.000                               1.716    0.777
    y3                3.000                               2.574    0.846

Covariances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
  intercept ~~                                                          
    time              0.231    0.043    5.426    0.000    0.305    0.305

Intercepts:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0                0.000                               0.000    0.000
   .y1                0.000                               0.000    0.000
   .y2                0.000                               0.000    0.000
   .y3                0.000                               0.000    0.000
    intercept         0.667    0.045   14.689    0.000    0.755    0.755
    time              0.228    0.040    5.640    0.000    0.265    0.265

Variances:
                   Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
   .y0                0.373    0.047    7.987    0.000    0.373    0.323
   .y1                0.298    0.026   11.450    0.000    0.298    0.131
   .y2                0.227    0.033    6.936    0.000    0.227    0.046
   .y3                0.456    0.075    6.086    0.000    0.456    0.049
    intercept         0.781    0.068   11.400    0.000    1.000    1.000
    time              0.736    0.052   14.202    0.000    1.000    1.000

Note that the residual variance at time 0 in the growth model is the reported residual variance in the mixed model. Unfortunately nlme is a bit convoluted2 in reporting the actual estimated variances, which are given not only as relative to the time 0 estimate, but also on the standard deviation scale. Even figuring out how to get these values is something no one can actually remember without multiple web searches, but it is possible.

(c(1.0000000, coef(mixedModel$modelStruct$varStruct, unconstrained=F))*mixedModel$sigma)^2 %>% 
  rbind(coef(growthCurveModel)[1:4])
                    1         2         3
. 0.3733432 0.2980478 0.2266376 0.4557127
  0.3733406 0.2980483 0.2266289 0.4557313

Summary

Latent growth curve models in the standard structural equation modeling setting are mixed models that will by default:

  • Estimate via maximum likelihood instead of REML
  • Treat time as categorical
  • Posit interactions with time
  • Assume heterogeneous variances across time
  • Enforce the ‘intercepts and slopes as outcomes’ type of thinking
  • Assumes complexity that should instead be tested in a model comparison framework
  • Make for notably more tedious syntax3

Even when you are thinking of indirect effects, autocorrelated residual structure, etc. there are tools in R that would allow you to stay in the standard mixed model setting. The nlme package (and others) provide several options for residual and other correlation structure (e.g. spatial). For certain mediation models, the mediation package would be able to do them with mixed models. Adding latent class structure to mixed models can be done with flexmix. Bayesian approaches are no more difficult with packages like rstanarm and brms.

More complex correlational structure (e.g. nested with cross random effects, spatial) are not even possible in the SEM setting4. If you have many time points, things only get worse for the LGC approach as far as the syntax. Even having just a handful of covariates makes the LGC approach exceedingly complex.

I think it is very useful to think of random effects as latent variables. However, I find little utility in explicitly modeling them that way. Most LGC I see are overly simplistic, not including demographic and other covariates because the model quickly gets unwieldy both visually, syntactically, and cognitively. The mixed model, on the other hand, allows one to stick to the more intuitive regression interpretation, while affording all the rich parameterization of (possibly complicated) random effects models.


  1. In case it’s not clear, this would be a time-invariant variable. If these were repeated measures on individuals, this would be something like sex or race.

  2. Most of this package object structure and output is irksome to me, starting with the umpteen (random!) decimal places. I have no problem admitting this may be a character flaw on my part. Note that there is a reason for having the first variance fixed to one for estimation purposes, but I can’t think of one for reporting it in this fashion.

  3. To its credit, MPlus does have some syntactical shortcuts to make things a little easier.

  4. At least that I’m aware of. You can presumably incorporate hierarchical structure via as multilevel SEM, where the SEM is an LGC.