Going Further

This section covers topics that are generally beyond the scope of what would be covered in this introductory document, but may be given their own section over time.

Other Distributions

As noted in the GLMM section, we are not held to use only GLM family distributions regarding the target variable. Unfortunately, the tools you have available to do so will quickly diminish. However, a couple packages could help in this regard with simpler random effects structures. For example, the mgcv and glmmTMB packages allow one access to a variety of target distributions, such as student t, negative binomial, beta, zero-inflated Poisson and more. If you’re willing to go Bayesian, you’ll have even more options with rstanarm and brms. I’ve personally had success with ordinal, beta, truncated normal and more with brms in particular.

Note also that nothing says that the random effects must come from a normal distribution either. You probably are going to need some notably strong theoretical reasons for trying something else, but it does come up for some situations. You’ll almost certainly need to use a specialized approach, as most mixed model tools do not offer that functionality out of the box.

Other Contexts

Here is a list of some other contexts in which you can find random effects models, or extensions of mixed models into other situations.

Spatial models

It is often the case we want to take into account the geography of a situation. Spatial random effects allow one to do so in the continuous case, e.g. with latitude and longitude coordinates, as well as discrete, as with political district. Typical random effects approaches, e.g. with a state random effect, would not correlate state effects. One might capture geography incidentally, or via cluster level variables such as ‘region’ indicator. However, if you’re interested in a spatial random effect, use something that can account for it specifically.

Survival models

Random effects models in the survival context are typically referred to as frailty models. As a starting point, the survival package that comes with base R can do such models.

Item response theory

Item response theory models are often used with scholastic and other testing data, but are far more general than that, because they are in fact a special type of random effects model. Some IRT models can be explicitly estimated as a mixed model, e.g. with lme4. See Boeck et al. (2011) The Estimation of Item Response Models with the lmer Function from the lme4 Package in R. I also have some brief demonstration here. Paul Bürkner, the author of brms also has a nice overview of IRT models in the Bayesian context.

Multi-membership models

Sometimes observations may belong to more than one cluster of some grouping variable. For example, in a longitudinal setting some individuals may move to other cities or schools, staying in one place longer than another. Depending on the specifics of the modeling setting, you may need to take a multi-membership approach to deal with this.

Phylogenetic models

In biology, models make take observations that are of the same species. While one can use species as an additional source of variance as in the manner we have demonstrated, the species are not independent as they may come from the same phylogenetic tree/branch. Bayesian packages are available to do such models (e.g. MCMCglmm and brms).

Adjacency structures

Similar to spatial and phylogenetic models, the dependency among the groups/clusters themselves can be described in terms of a markov random field/undirected graph. In simpler terms, one may think of a situation where a binary adjacency matrix would denote connections among the nodes/cluster levels. For example, the clustering may be due to individuals, which themselves might be friends with one another. One way to deal with such a situation would be similar to spatial models for discrete random units.

Gaussian processes

Gaussian processes are another way to handle dependency in the data, especially over time or space. Some spatial models are in fact a special case of these. One can think of gaussian processes as adding a ‘continuous category’ random effect. Consider the effect of age in many models, could that not also be a source of dependency regarding some outcomes? In Statistical Rethinking, McElreath has a nice chapter ‘Adventures in Covariance’ that gets into this a bit, and Gaussian process are widely discussed among the Stan developers.

Surveys & Mr. P

Clustering is often a result of sampling design. Often one would use a survey design approach for proper inference in such situations, and you can use mixed models with survey weights. However, multi-level regression with post-stratification, or Mr. P, is an alternative mixed model approach that can potentially lead to better results in the same setting without weighting. One might even be able to generalize from a sample of Xbox players to the national level!

Post-hoc comparisons and multiple testing

This is not an issue I’m personally all that concerned with, but a lot of folks seem to be. The ‘problem’ is that one has a lot of p-values for some model or across a set of models, and is worried about spurious claims of significance. If one were truly worried about it, they’d be doing different models that would incorporate some sort of regularization, rather than attempting some p-value hack afterwards. Didn’t we talk about regularization somewhere? Yep, you can use a mixed model approach to compare groups instead, specifying the grouping as a random effect. See Gelman for details.

Growth mixture models

Often people will assume latent clusters of individuals within the data, with model effects differing by these latent groups also. Sometimes called latent trajectory models, these are conceptually adding a cluster analysis to the mixed model setting for longitudinal data. While common in structural equation modeling, packages like flexmix can keep you in the standard model setting, which would be preferable.

Embeddings and Neural Nets

Much is made about using neural nets to create word, sentence or other embeddings, essentially taking a string of text and converting it to numeric form. But think about what we’ve been doing here. We’ve taken a categorical/string variable like student id, and converted to numeric form (a random effect). Not so complicated a notion really.

Beyond that, one can essentially conduct a mixed model as one would a neural net for standard tabular data, because a neural net incorporates regularization on all parameters, and in particular those group/cluster effects. So if your data was properly coded, you could set up your neural net to produce something very similar to what we have in the matrix form of a random effect model. This is similar to what mgcv actually does for random effects estimation. By default though, the information extracted from the groups would be done so in a much different way, and would be able to pick up on inter-group correlations as well (much like a spatial random effects model).

Nonlinear Mixed Effects Models

Earlier we used the nlme package. The acronym stands for nonlinear mixed effects models. In this case, we are assuming a specific functional form for a predictor. A common example is a logistic growth curve29, and one could use a function like SSlogis.

In other cases we do not specify the functional form, and take a more non-parametric approach. Here’s where the powerful mgcv package comes in, and there are few if any that have its capabilities for generalized additive models combined with standard random effects approaches. Depending on the approach you take, you can even get nlme or lme4 output along with the GAM results. Highly recommended.

I would also recommend brms, which has specific functionality for nonlinear models in general, including IRT models, as well as additive models in the vein of mgcv, as it uses the same constructor functions that come with that package. It might be your best bet whether you have a specific nonlinear functional form or not.


The incorporation of spatial random effects, additive models, and mixed models altogether under one modeling roof is sometimes referred to as structured additive regression models, or STARs. The mgcv package is at least one place where you can pull this off. But the notion of a random effect is a broad one, and we might think of many such similar effects to add to a model.

As mentioned previously, thinking of parameters as random, instead of fixed, essentially puts one in the Bayesian mindset. Moving to that world for your modeling will open up many doors, including expanding your mixed model options.

  1. Not to be confused with latent growth curve models or logistic regression.↩︎