Intro

Latent growth curve models (LGC) and mixed models, are two popular ways to analyze longitudinal data, such as taking repeated measurements for individuals. They can provide equivalent results, though default settings would usually have some slight difference in the model. I specifically wanted to compare them for settings with low(ish) numbers of clusters, essentially to see how and when they begin to fail. As an example, having 50 individuals measured at 4 time points each wouldn’t cause many to think twice in the mixed model setting, while running a structural equation model with 50 individuals has never been recommended. However, growth curve models are a notably constrained type of SEM, as many parameters are fixed, which helps matters, but I wanted to investigate this further.

Setup

The parameters are as follows:

  • Number of clusters (3): 10, 25, 50
  • Time points within cluster (3): 5, 10, 25
  • Correlation of random effects (5): -.5 to .5

Thus sample sizes range from the ridiculously small (10*5 = 50 total observations) to fairly healthy (50*25 = 1250 total). Random effects were drawn from a multivariate normal and had that exact correlation (i.e. there was no distribution for the correlation; using the MASS package with empirical = TRUE). Fixed effects are 3 and .75 for the intercept and effect of ‘time’. Variances of the random effects and residuals were set to 1.0. Note that the mixed model estimated residual variance separately for each time point as would be the default in the LGC model, however due to the number of estimates they won’t we shown in what follows. Number of simulated data sets for each of the 45 settings is 500.

Growth curve models were estimated with the R package lavaan, the mixed models were estimated with nlme with varIdent weights argument. Both use the nlminb optimizer that comes with base R by default. I had to change some of the default optimizer settings with nlme to be more in keeping with lavaan’s default1. With enough data, estimates should be identical, but the point here is to investigate performance in the low N setting.

Results

The initial set of results compares the Latent Growth Curve and mixed model, both estimated via maximum likelihood. First, bias in parameter estimates is noted, then interval widths compared. Note that this takes all results as they are, even if there were potential problems in the estimation process. We will look at ‘clean’ results later.

Parameter estimates

Bias here refers to the difference in the mean estimated parameter value vs. the true.

Fixed effects

Nothing worrisome here as expected.

Random effect variance

Both underestimate random effects variance at these settings, and more so with smaller overall sample sizes, perhaps a little less with the LGC in the smallest N setting.

Random effect correlation

With few time points (5, 10) and/or smaller number of clusters (10, 25), both generally seem to exhibit positive bias, and both have a tendency to have more problem with positive correlation.

Interval width

95% confidence intervals were taken as the .025 and .975 quantiles of the simulation estimates.

Fixed effects

General trend of narrower interval estimates for mixed models with smaller sample settings, and low number of clusters generally. The LGC and Mixed Model come together fairly quickly though. We’ll discuss the issues involved in the next section.

Random effect variance

General trend of narrower interval estimates for mixed models with smaller sample settings, and low number of clusters generally.

Random effect correlation

The actual interval estimates for LGC were not confined to [-1,1], and both struggle in the small N setting. At about a total N observed of 100 they begin to be consistent.

Estimation issues

With the smaller sample sizes we run into numerous issues in estimation. In typical SEM settings this may result in negative variance estimates or other problems (so-called Heywood cases). With mixed models, estimation of a perfect correlation of random intercepts and slopes might similarly call for further diagnostic inspection2. In addition, we know mixed models with maximum likelihood will produce biased variance estimates with small samples. As such, most mixed model packages will default to restricted maximum likelihood (REML) which serves to counter this.

I’ve defined ‘failure’ as any run that resulted in a NaN for lavaan, or a perfect positive/negative correlation for nlme, i.e., where the correlation equaled 1.0 with no rounding. The following shows the proportion of such failures, which are quite dramatic for the growth curve approach. We also see REML performs better than ML for the mixed model approach.

Results Clean

Given the estimation issues, here are results for situations in which those did not exist. This perhaps puts the estimation approaches on unequal footing since the criterion is different, but it still may be informative. In addition, I’ve chosen the REML estimates for mixed model results. However, the differences become negligible with more observations, as we have already seen and would expect from the outset.

Parameter estimates

Fixed effects

Nothing to see here.

Random effect variance

At the lowest sample/cluster sizes, LGC underestimates random effects variance at these settings, while REML overestimates. For the slope variance, LGC continues to struggle while REML shows no issues. Interestingly, LGC continues to underestimate the variances at even the larger sample sizes with these ‘clean’ results.

Random effect correlation

REML struggles with the RE correlation in the case of fewest time points. LGC struggles when the number of clusters or time points is small. Some of the LGC models are not without issue however. Even though the remaining models have estimates for every parameter, the results are still problematic in that they can produce an absolute correlation > 1. Removing those cases certainly helps with the estimates (most of the bias is gone except with the lowest N setting), but now one has almost roughly a minimum failure rate of 10% for anything at or less than 125 total observations.

Interval width

As before, 95% confidence intervals were taken as the quantiles of the simulation estimates.

Fixed effects

Little difference here except at the smallest N settings.

Random effect variance

We now see that the width seen in LGC were due to the problematic situations.

Random effect correlation

Summary

Latent growth curve modeling and mixed models offer two ways to approach longitudinal data. If set up properly, and with large amounts of data, they will result in identical estimates. The purpose of this doc was to examine the failures at small sample sizes.

I typically tell people who are considering SEM that they shouldn’t unless they have several hundred observations, a fact that has been born out in numerous studies, but which is still somehow regularly lost on the applied researcher. Growth curve models are highly constrained SEM, and it would appear they still require a couple hundred observations even in the well-behaved data settings above before estimation issues are finally overcome. In very small N settings, and especially with few time points the LGC struggles quite noticeably, often failing outright. Meanwhile, a mixed model with REML appears to have issues mostly only with the smallest cluster x time point settings.

Despite the similarities of the two approaches, they are different models however. Time technically is an estimated parameter in the LGC - though we typically fix the values, they can be freely estimated. This is because they are just ‘loadings’ in the factor analytic sense3. Secondly, any other time-varying covariates added to the LGC are assumed to interact with time. Thus while the slopes would vary over individuals in the mixed model, they vary over time in the LGC. As mentioned though, with proper constraints, the LGC can mimic the mixed model.

However, there are still a few things against using LGC in my opinion, when both are viable or easily implemented.

  • LGC requires balanced data, and so one must estimate missing values as a default, or else risk losing a lot of data even with only minor missingness, and doing so can make post-estimation procedures problematic depending on the imputation technique chosen.
  • LGC also requires four time points4, but as the above demonstrates, success is not guaranteed with 5 time points even with 50 clusters.
  • As we have seen, LGC is not very viable for very small samples. On the other hand, one could run a repeated measures anova, a special case of the mixed model, on two time points and a handful of subjects without issue.
  • Due to the differences in models, adding time-varying covariates to an LGC is tedious and causes the number of estimated parameters to balloon quickly relative to the mixed model, which adds to the amount of data you’d want to have. In practice, LGC users often leave out covariates due to the inherent complexities incurred with their addition to the model.
  • These days, growth mixture models, mediation, multiple outcomes etc. can potentially be done with mixed models with little extra effort beyond the standard mixed model, and do not require an SEM approach.
  • Mixed models are not restricted to time-based clustering5.
  • There are better (in my opinion) approaches to nonlinear effects, regularization, and other complexities in the non-SEM setting.

In general, LGC should probably be reserved for models that include additional latent variables (beyond the intercept and slope factors) and indirect effects. In such cases, the data requirements will be even more taxing, but the LGC could be the right tool for the job. If a mixed model approach is appropriate, it should probably be preferred, and especially so with smaller sample sizes.

Future

Possibilities for the future:

vs. Bayesian mixed vs. Bayesian growth vs. gam with cor=0

Code

link (Still quite messy)

lavaan optimization settings

$control
$control$eval.max
[1] 20000

$control$iter.max
[1] 10000

$control$trace
[1] 0

$control$step.min
[1] 1

$control$step.max
[1] 1

$control$abs.tol
[1] 2.220446e-15

$control$rel.tol
[1] 1e-10

$control$x.tol
[1] 1.5e-08

$control$xf.tol
[1] 2.2e-14

  1. lavaan uses same ‘control’ terminology as nlminb, but nlme notes msMaxEval, and msMaxIter for eval.max and iter.max arguments respectively. Aside from the iteration defaults, the only other difference between lavaan and nlme is with abs.tol, which by default is not used with nlminb (i.e. it is set to zero). Beyond that both lavaan and nlme use the defaults. The following is a print of growthfittedmodel@optim.

  2. See Ben Bolker’s discussion on the R-sig-ME list here.

  3. They are often estimated to capture nonlinear trends.

  4. Like other SEM it can be run with three with proper constraints.

  5. One can take the long form ‘multilevel’ approach within the SEM context. Like with the LGC, unless you have a complicated model full of indirect effects and latent variables, the only reason to use the SEM software is that you like to spend a lot more time coding up the model.