Let’s revisit the results all at once. Seeing the output side-by-side may prove helpful in comparing the different approaches.

Fixed Effects

We’ll start with the ‘fixed effects’ or ‘population average’ estimates. It may be worth sorting by Effect22. I also include an random intercept only model for a more direct comparison between the RE and FE models.

The main thing to note is that we would come to no grand differences in substantive conclusions, except for the Grouped and FE approach, where we can come to no conclusion about the treatment effect. Even statistically, the conclusions would be the same, so what can one conclude about these main effects?

For one, we typically don’t have 10000 observations for our models, and so the differences would be more notable in that case. I invite you to rerun the simulation with a smaller sample size where n = 100 individuals instead of 2500, as well as with smaller effects (these are large), and see what you come up with. In general, with a lot of data you shouldn’t come to wildly different conclusions with different techniques23.

While I have yet to come across a client that actually cares what the precise value of the standard error is, many care about statistical significance. If that is of primary concern, and it shouldn’t be, then one would prefer techniques that get better estimates of the standard error24. In that sense, a cluster robust approach would help, but would be better if applied in the FE/RE/GEE model settings.

Variance estimates

Here we’ll compare variance estimates. I add mixed and gee models with no residual correlation estimate to make more direct comparisons to the growth curve model, as well as an random intercept only with no dependency structure assumed (other than independence) for comparison to the FE model.

Cluster robust modifications alone will change nothing compared to a SLiM except for the standard errors of the main effects. A GEE approach with an independence structure and constant residual variance is equivalent to the SLiM. The FE residual variance is equivalent to the SLiM if id had been included in the model, and is equivalent to that estimated by an RE intercept only model with no residual dependency estimated. The estimates of RE model with random intercepts and slopes but no correlation structure assumed are identical to the growth curve estimates. We can also see that the by-group SLiM approach overfits, allowing too much variability in group-specific intercepts and slopes relative to the mixed model .


This is only a single and contrived data example that is based on random effects as part of the underlying data generating process. If there is no clustering effect, then the standard linear model would obviously the way to go, but that is also not the situation we’re interested in. If there is no residual correlation, or only cluster-specific effects akin to an ‘intercept-only’ model, little changes from what has been noted above, aside from latent growth curve models being less of an option, and they wouldn’t be in a non-longitudinal setting anyway. In general, the same issues noted above would still be in play for the most part for simpler settings.

Specifically, I think it’s safe to say that the standard linear model approach has drawbacks and is simply not necessary given how easy it is to conduct more appropriate methods. Using cluster robust standard errors may help, but ignores what could potentially be a very rich investigation of cluster specific effects, and may be best used as a diagnostic or signal for a misspecified model. I also think it’s difficult to justify the FE approach where we can’t even investigate cluster level covariates, but again, that is my bias, and may not be shared by others. Latent growth curve models will probably only be useful if one is dealing with very complex SEM, and which in that case, one will have a host of other issues to contend with that aren’t applicable in this setting. Otherwise, LGCM will be identical to mixed models, and even some more complex than depicted, and may only make for more tedious coding with little else gained.

That leaves the mixed model and GEE GLM approaches. Between these two, we don’t even have to choose if it’s a random intercept, linear model, as they would have similar ‘population-average’ interpretations, and similar estimates of like parameters, though the mixed model provides cluster-specific effects and predictions. Interpretations do change with nonlinear models such as with a binary outcome and logit link, but see the Gardiner reference below for how the estimated coefficients relate to one another in those cases.

Only the mixed model provides cluster-specific effects and predictions, while allowing for complex cluster structures and retaining cluster level covariates. What’s more, they can be specified to overcome the primary motivation for preferring FE models. In addition, they readily extend to other types of ‘random effects’ models (e.g. spatial).

In conclusion then, the random effects/mixed model approach can provide most everything one could want in dealing with a variety of clustered data situations. It is highly flexible, and a very powerful tool to have at one’s disposal.

  1. For the grouped SE, I just calculated as one would the mean sd/sqrt(n). I thought about using Rubin’s rules as one would for the missing value situation, but each four observation linear model has very high variance. This approach puts it in the ball park of the other estimates.

  2. “More data beats a cleverer algorithm.” ~ Pedro Domingos

  3. Note that when there are multiple sources of variance, it is difficult to know what the standard error should be. Programs like Stata and SAS make the decision for you, and provide approximate p-values (via Kenward-Roger or other approximation), while the R package lme4 will not provide p-values. One can get decent interval estimates (e.g. via bootstrap or MCMC), but see the R wiki reference for some details on this issue.