Mixed Models

This chapter regards mixed models, a flexible expansion of standard regression methods to account for some covariance among the observations.

One-factor Mixed Model

The following is an approach for one factor random effects model via maximum likelihood in R (and Matlab and Julia in the Supplemental Section). It’s based on Statistical Modeling and Computation (2014) Chapter 10, example 10.10. Unfortunately I did this before knowing they had both Matlab and R code on their website, though the R code here is a little cleaner and has comments.

Data Setup

The data regards crop yield from 10 randomly selected locations and three collections at each location.

library(tidyverse)

y = matrix(c(22.6, 20.5, 20.8,
             22.6, 21.2, 20.5,
             17.3, 16.2, 16.6,
             21.4, 23.7, 23.2,
             20.9, 22.2, 22.6,
             14.5, 10.5, 12.3,
             20.8, 19.1, 21.3,
             17.4, 18.6, 18.6,
             25.1, 24.8, 24.9,
             14.9, 16.3, 16.6), 
           10, 3, byrow = TRUE)

Function

The estimating function.

one_factor_re <- function(mu, sigma2_mu, sigma2){
  # Args
  # mu: intercept
  # sigma2_mu: variance of intercept
  # sigma2: residual variance of y
  
  # I follow their notation for consistency
  d  = nrow(y)
  ni = ncol(y)
  
  # covariance matrix of observations
  Sigmai = sigma2 * diag(ni) + sigma2_mu * matrix(1, ni, ni)
  
  # log likelihood
  l = rep(NA, 10)
  # iterate over the rows
  for(i in 1:d){
    l[i] = .5 * t(y[i, ] - mu) %*% chol2inv(chol(Sigmai)) %*% (y[i, ] - mu)  
  }
  
  ll =  -(ni*d) / 2*log(2*pi) - d / 2*log(det(Sigmai)) - sum(l)
  
  return(-ll)
}

Estimation

Starting values.

starts = list(
  mu = mean(y),
  sigma2_mu = var(rowMeans(y)),
  sigma2    = mean(apply(y, 1, var))
)

Estimate at the starting values.

one_factor_re(mu = starts[[1]],
              sigma2_mu = starts[[2]],
              sigma2    = starts[[3]])
[1] 62.30661

Package bbmle has an mle2 function for maximum likelihood estimation based on underlying R functions like optim, and produces a nice summary table. LBFGS-B is used to place lower bounds on the variance estimates.

library(bbmle)

fit_mle = mle2(
  one_factor_re ,
  start  = starts,
  method = 'L-BFGS-B',
  lower  = c(
    mu        = -Inf,
    sigma2_mu = 0,
    sigma2    = 0
  ),
  trace = TRUE
)

Comparison

We can compare to the lme4 model result.

library(lme4)
library(tidyverse)

d = data.frame(y) %>% 
  pivot_longer(everything(), names_to = 'x', values_to = 'value') %>% 
  arrange(x) %>% 
  group_by(x) %>% 
  mutate(group = 1:n())

fit_mer = lmer(value ~ 1 | group, data = d, REML = FALSE)

summary(fit_mle)
Maximum likelihood estimation

Call:
mle2(minuslogl = one_factor_re, start = starts, method = "L-BFGS-B", 
    trace = TRUE, lower = c(mu = -Inf, sigma2_mu = 0, sigma2 = 0))

Coefficients:
          Estimate Std. Error z value     Pr(z)    
mu        19.60000    1.12173 17.4729 < 2.2e-16 ***
sigma2_mu 12.19400    5.62858  2.1664  0.030277 *  
sigma2     1.16667    0.36893  3.1623  0.001565 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

-2 log L: 124.5288 
summary(fit_mer)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: value ~ 1 | group
   Data: d

     AIC      BIC   logLik deviance df.resid 
   130.5    134.7    -62.3    124.5       27 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.9950 -0.6555  0.1782  0.4870  1.7083 

Random effects:
 Groups   Name        Variance Std.Dev.
 group    (Intercept) 12.194   3.492   
 Residual              1.167   1.080   
Number of obs: 30, groups:  group, 10

Fixed effects:
            Estimate Std. Error t value
(Intercept)   19.600      1.122   17.47
-2 * logLik(fit_mer)
'log Lik.' 124.5288 (df=3)

Two-factor Mixed Model

An approach for two factor random effects model via maximum likelihood in R Matlab and Julia. It’s based on Statistical Modeling and Computation (2014) Chapter 10, example 10.10. See the previous chapter for a one factor model, and the Supplemental Section for the Matlab and Julia versions of this example. Note that the text has a typo on the sigma2 variance estimate (value should be .0023 not .023).

Data Setup

The data regards the breeding value of a set of five sires in raising pigs. Each sire is mated to a random group of dams, with the response being the average daily weight gain in pounds of two piglets in each litter.

library(tidyverse)

y = c(1.39,1.29,1.12,1.16,1.52,1.62,1.88,1.87,1.24,1.18,
      .95,.96,.82,.92,1.18,1.20,1.47,1.41,1.57,1.65)

# for use in lme4, but also a more conceptual representation of the data
d = expand.grid(sire = rep(1:5, 2), dam = 1:2)
d = data.frame(d[order(d$sire), ], y)

Function

The function takes the log variances eta as input to keep positive.

two_factor_re <- function(mu, eta_alpha, eta_gamma, eta) {
  # Args 
  # mu: intercept 
  # eta_alpha: random effect one 
  # eta_gamma: random effect two
  # eta: residual variance of y
  
  sigma2_alpha = exp(eta_alpha)
  sigma2_gamma = exp(eta_gamma)
  sigma2 = exp(eta)
  n = length(y)
  
  # covariance matrix of observations
  Sigma = sigma2 * diag(n) + sigma2_alpha * tcrossprod(Xalpha) + 
    sigma2_gamma * tcrossprod(Xgamma)
  
  
  # log likelihood
  ll = -n / 2 * log(2 * pi) - sum(log(diag(chol(Sigma)))) -
    .5 * t(y - mu) %*% chol2inv(chol(Sigma)) %*% (y - mu)
  
  return(-ll)
}

Estimation

Starting values and test.

starts = list(
  mu = mean(y),
  eta_alpha = var(tapply(y, d$sire, mean)),
  eta_gamma = var(y) / 3,
  eta = var(y) / 3
)

Xalpha = diag(5) %x% rep(1, 4)

Xgamma = diag(10) %x% rep(1, 2)

Estimate at starting values.

two_factor_re(
  mu  = starts[[1]],
  eta_alpha = starts[[2]],
  eta_gamma = starts[[3]],
  eta = starts[[4]]
)
         [,1]
[1,] 26.53887

Package bbmle has an mle2 function for maximum likelihood estimation based on underlying R functions like optim, and produces a nice summary table. LBFGS-B is used to place lower bounds on the variance estimates.

library(bbmle)

fit_mle = mle2(two_factor_re, start = starts,  method = 'BFGS')

Comparison

We can compare to the lme4 model result.

### lme4 comparison
library(lme4)

fit_mer = lmer(y ~ (1 | sire) + (1 | dam:sire), d, REML = FALSE)

summary(fit_mle)
Maximum likelihood estimation

Call:
mle2(minuslogl = two_factor_re, start = starts, method = "BFGS")

Coefficients:
          Estimate Std. Error  z value     Pr(z)    
mu         1.32000    0.11848  11.1410 < 2.2e-16 ***
eta_alpha -2.92393    0.84877  -3.4449 0.0005712 ***
eta_gamma -3.44860    0.65543  -5.2616 1.428e-07 ***
eta       -6.07920    0.44721 -13.5935 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

-2 log L: -23.98631 
exp(coef(fit_mle)[-1])
 eta_alpha  eta_gamma        eta 
0.05372198 0.03178996 0.00229000 
summary(fit_mer)
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: y ~ (1 | sire) + (1 | dam:sire)
   Data: d

     AIC      BIC   logLik deviance df.resid 
     -16      -12       12      -24       16 

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.21052 -0.59450  0.02314  0.61984  1.10386 

Random effects:
 Groups   Name        Variance Std.Dev.
 dam:sire (Intercept) 0.03179  0.17830 
 sire     (Intercept) 0.05372  0.23178 
 Residual             0.00229  0.04785 
Number of obs: 20, groups:  dam:sire, 10; sire, 5

Fixed effects:
            Estimate Std. Error t value
(Intercept)   1.3200     0.1185   11.14

Mixed Model via ML

Introduction

The following is based on the Wood text (Generalized Additive Models, 1st ed.) on additive models, chapter 6 in particular. It assumes familiarity with standard regression from a matrix perspective and at least passing familiarity with mixed models. The full document this chapter is based on can be found here, and contains more detail and exposition. See also the comparison of additive and mixed models here.

For this we’ll use the sleepstudy data from the lme4 package. The data has reaction times for 18 individuals over 10 days each (see the help file for the sleepstudy object for more details).

Data Setup

library(tidyverse)

data(sleepstudy, package = 'lme4')

X = model.matrix(~Days, sleepstudy)
Z = model.matrix(~factor(sleepstudy$Subject) - 1)

colnames(Z) = paste0('Subject_', unique(sleepstudy$Subject))  # for cleaner presentation later
rownames(Z) = paste0('Subject_', sleepstudy$Subject)

y = sleepstudy$Reaction

Function

The following is based on the code in Wood (6.2.2), with a couple modifications for consistent nomenclature and presentation. We use optim and a minimizing function, in this case the negative log likelihood, to estimate the parameters of interest, collectively \(\theta\), in the code below. The (square root of the) variances will be estimated on the log scale. In Wood, he simply extracts the ‘fixed effects’ for the intercept and days effects using lm (6.2.3), and we’ll do the same.

mixed_ll <- function(y, X, Z, theta) {
  tau   = exp(theta[1])
  sigma = exp(theta[2])
  n = length(y)
  
  # evaluate covariance matrix for y
  e = tcrossprod(Z)*tau^2 + diag(n)*sigma^2
  L = chol(e)  # L'L = e
  
  # transform dependent linear model to independent
  y  = backsolve(L, y, transpose = TRUE)
  X  = backsolve(L, X, transpose = TRUE)
  b  = coef(lm(y~X-1))
  LP = X %*% b
  
  ll = -n/2*log(2*pi) -sum(log(diag(L))) - crossprod(y - LP)/2
  -ll
}

Here is an alternative function using a multivariate normal approach that doesn’t use the transformation to independence, and might provide additional perspective. In this case, y is just one large multivariate draw with mu mean vector and N x N covariance matrix e.

mixed_ll_mv <- function(y, X, Z, theta) {
  tau   = exp(theta[1])
  sigma = exp(theta[2])
  n = length(y)
  
  # evaluate covariance matrix for y
  e  = tcrossprod(Z)*tau^2 + diag(n)*sigma^2
  b  = coef(lm.fit(X, y))
  mu = X %*% b

  ll = mvtnorm::dmvnorm(y, mu, e, log = TRUE)
  -ll
}

Estimation

Now we’re ready to use the optim function for estimation. A slight change to tolerance is included to get closer estimates to lme4, which we will compare shortly.

param_init = c(0, 0)
names(param_init) = c('tau', 'sigma')

fit = optim(
  fn  = mixed_ll,
  X   = X,
  y   = y,
  Z   = Z,
  par = param_init,
  control = list(reltol = 1e-10)
)

fit_mv = optim(
  fn  = mixed_ll_mv,
  X   = X,
  y   = y,
  Z   = Z,
  par = param_init,
  control = list(reltol = 1e-10)
)

Comparison

func tau sigma negLogLik X(Intercept) XDays
mixed_ll 36.016 30.899 897.039 251.405 10.467
mixed_ll_mv 36.016 30.899 897.039 251.405 10.467

As we can see, both formulations produce identical results. We can now compare those results to the lme4 output for the same model, and see that we’re getting what we should.

library(lme4)

fit_mer = lmer(Reaction ~ Days + (1 | Subject), sleepstudy, REML = FALSE)

fit_mer
Linear mixed model fit by maximum likelihood  ['lmerMod']
Formula: Reaction ~ Days + (1 | Subject)
   Data: sleepstudy
      AIC       BIC    logLik  deviance  df.resid 
1802.0786 1814.8505 -897.0393 1794.0786       176 
Random effects:
 Groups   Name        Std.Dev.
 Subject  (Intercept) 36.01   
 Residual             30.90   
Number of obs: 180, groups:  Subject, 18
Fixed Effects:
(Intercept)         Days  
     251.41        10.47  

We can also predict the random effects (Wood, 6.2.4), sometimes called BLUPs (Best Linear Unbiased Predictions) and after doing so again compare the results to the lme4 estimates.

tau = exp(fit$par)[1]
tausq   = tau^2
sigma   = exp(fit$par)[2]
sigmasq = sigma^2
Sigma   = tcrossprod(Z)*tausq/sigmasq + diag(length(y))
ranef_est = tausq * t(Z) %*% solve(Sigma) %*% resid(lm(y ~ X - 1))/sigmasq
ranef_est lme4
Subject_308 40.64 40.64
Subject_309 -77.57 -77.57
Subject_310 -62.88 -62.88
Subject_330 4.39 4.39
Subject_331 10.18 10.18
Subject_332 8.19 8.19
Subject_333 16.44 16.44
Subject_334 -2.99 -2.99
Subject_335 -45.12 -45.12
Subject_337 71.92 71.92
Subject_349 -21.12 -21.12
Subject_350 14.06 14.06
Subject_351 -7.83 -7.83
Subject_352 36.25 36.25
Subject_369 7.01 7.01
Subject_370 -6.34 -6.34
Subject_371 -3.28 -3.28
Subject_372 18.05 18.05

Issues with ML estimation

Situations arise in which using maximum likelihood for mixed models would result in notably biased estimates (e.g. small N, lots of fixed effects), and so it is typically not used. Standard software usually defaults to restricted maximum likelihood. However, our purpose here has been served.