Mixed Models
This chapter regards mixed models, a flexible expansion of standard regression methods to account for some covariance among the observations.
One-factor Mixed Model
The following is an approach for one factor random effects model via maximum likelihood in R (and Matlab and Julia in the Supplemental Section). It’s based on Statistical Modeling and Computation (2014) Chapter 10, example 10.10. Unfortunately I did this before knowing they had both Matlab and R code on their website, though the R code here is a little cleaner and has comments.
Data Setup
The data regards crop yield from 10 randomly selected locations and three collections at each location.
library(tidyverse)
= matrix(c(22.6, 20.5, 20.8,
y 22.6, 21.2, 20.5,
17.3, 16.2, 16.6,
21.4, 23.7, 23.2,
20.9, 22.2, 22.6,
14.5, 10.5, 12.3,
20.8, 19.1, 21.3,
17.4, 18.6, 18.6,
25.1, 24.8, 24.9,
14.9, 16.3, 16.6),
10, 3, byrow = TRUE)
Function
The estimating function.
<- function(mu, sigma2_mu, sigma2){
one_factor_re # Args
# mu: intercept
# sigma2_mu: variance of intercept
# sigma2: residual variance of y
# I follow their notation for consistency
= nrow(y)
d = ncol(y)
ni
# covariance matrix of observations
= sigma2 * diag(ni) + sigma2_mu * matrix(1, ni, ni)
Sigmai
# log likelihood
= rep(NA, 10)
l # iterate over the rows
for(i in 1:d){
= .5 * t(y[i, ] - mu) %*% chol2inv(chol(Sigmai)) %*% (y[i, ] - mu)
l[i]
}
= -(ni*d) / 2*log(2*pi) - d / 2*log(det(Sigmai)) - sum(l)
ll
return(-ll)
}
Estimation
Starting values.
= list(
starts mu = mean(y),
sigma2_mu = var(rowMeans(y)),
sigma2 = mean(apply(y, 1, var))
)
Estimate at the starting values.
one_factor_re(mu = starts[[1]],
sigma2_mu = starts[[2]],
sigma2 = starts[[3]])
[1] 62.30661
Package bbmle has an mle2 function for maximum likelihood estimation based on underlying R functions like optim, and produces a nice summary table. LBFGS-B is used to place lower bounds on the variance estimates.
library(bbmle)
= mle2(
fit_mle
one_factor_re ,start = starts,
method = 'L-BFGS-B',
lower = c(
mu = -Inf,
sigma2_mu = 0,
sigma2 = 0
),trace = TRUE
)
Comparison
We can compare to the lme4 model result.
library(lme4)
library(tidyverse)
= data.frame(y) %>%
d pivot_longer(everything(), names_to = 'x', values_to = 'value') %>%
arrange(x) %>%
group_by(x) %>%
mutate(group = 1:n())
= lmer(value ~ 1 | group, data = d, REML = FALSE)
fit_mer
summary(fit_mle)
Maximum likelihood estimation
Call:
mle2(minuslogl = one_factor_re, start = starts, method = "L-BFGS-B",
trace = TRUE, lower = c(mu = -Inf, sigma2_mu = 0, sigma2 = 0))
Coefficients:
Estimate Std. Error z value Pr(z)
mu 19.60000 1.12173 17.4729 < 2.2e-16 ***
sigma2_mu 12.19400 5.62858 2.1664 0.030277 *
sigma2 1.16667 0.36893 3.1623 0.001565 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-2 log L: 124.5288
summary(fit_mer)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: value ~ 1 | group
Data: d
AIC BIC logLik deviance df.resid
130.5 134.7 -62.3 124.5 27
Scaled residuals:
Min 1Q Median 3Q Max
-1.9950 -0.6555 0.1782 0.4870 1.7083
Random effects:
Groups Name Variance Std.Dev.
group (Intercept) 12.194 3.492
Residual 1.167 1.080
Number of obs: 30, groups: group, 10
Fixed effects:
Estimate Std. Error t value
(Intercept) 19.600 1.122 17.47
-2 * logLik(fit_mer)
'log Lik.' 124.5288 (df=3)
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/Mixed%20Models/one_factor_RE.R
Two-factor Mixed Model
An approach for two factor random effects model via maximum likelihood in R Matlab and Julia. It’s based on Statistical Modeling and Computation (2014) Chapter 10, example 10.10. See the previous chapter for a one factor model, and the Supplemental Section for the Matlab and Julia versions of this example. Note that the text has a typo on the sigma2
variance estimate (value should be .0023 not .023).
Data Setup
The data regards the breeding value of a set of five sires in raising pigs. Each sire is mated to a random group of dams, with the response being the average daily weight gain in pounds of two piglets in each litter.
library(tidyverse)
= c(1.39,1.29,1.12,1.16,1.52,1.62,1.88,1.87,1.24,1.18,
y 95,.96,.82,.92,1.18,1.20,1.47,1.41,1.57,1.65)
.
# for use in lme4, but also a more conceptual representation of the data
= expand.grid(sire = rep(1:5, 2), dam = 1:2)
d = data.frame(d[order(d$sire), ], y) d
Function
The function takes the log variances eta
as input to keep positive.
<- function(mu, eta_alpha, eta_gamma, eta) {
two_factor_re # Args
# mu: intercept
# eta_alpha: random effect one
# eta_gamma: random effect two
# eta: residual variance of y
= exp(eta_alpha)
sigma2_alpha = exp(eta_gamma)
sigma2_gamma = exp(eta)
sigma2 = length(y)
n
# covariance matrix of observations
= sigma2 * diag(n) + sigma2_alpha * tcrossprod(Xalpha) +
Sigma * tcrossprod(Xgamma)
sigma2_gamma
# log likelihood
= -n / 2 * log(2 * pi) - sum(log(diag(chol(Sigma)))) -
ll 5 * t(y - mu) %*% chol2inv(chol(Sigma)) %*% (y - mu)
.
return(-ll)
}
Estimation
Starting values and test.
= list(
starts mu = mean(y),
eta_alpha = var(tapply(y, d$sire, mean)),
eta_gamma = var(y) / 3,
eta = var(y) / 3
)
= diag(5) %x% rep(1, 4)
Xalpha
= diag(10) %x% rep(1, 2) Xgamma
Estimate at starting values.
two_factor_re(
mu = starts[[1]],
eta_alpha = starts[[2]],
eta_gamma = starts[[3]],
eta = starts[[4]]
)
[,1]
[1,] 26.53887
Package bbmle has an mle2 function for maximum likelihood estimation based on underlying R functions like optim, and produces a nice summary table. LBFGS-B is used to place lower bounds on the variance estimates.
library(bbmle)
= mle2(two_factor_re, start = starts, method = 'BFGS') fit_mle
Comparison
We can compare to the lme4 model result.
### lme4 comparison
library(lme4)
= lmer(y ~ (1 | sire) + (1 | dam:sire), d, REML = FALSE)
fit_mer
summary(fit_mle)
Maximum likelihood estimation
Call:
mle2(minuslogl = two_factor_re, start = starts, method = "BFGS")
Coefficients:
Estimate Std. Error z value Pr(z)
mu 1.32000 0.11848 11.1410 < 2.2e-16 ***
eta_alpha -2.92393 0.84877 -3.4449 0.0005712 ***
eta_gamma -3.44860 0.65543 -5.2616 1.428e-07 ***
eta -6.07920 0.44721 -13.5935 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
-2 log L: -23.98631
exp(coef(fit_mle)[-1])
eta_alpha eta_gamma eta
0.05372198 0.03178996 0.00229000
summary(fit_mer)
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: y ~ (1 | sire) + (1 | dam:sire)
Data: d
AIC BIC logLik deviance df.resid
-16 -12 12 -24 16
Scaled residuals:
Min 1Q Median 3Q Max
-1.21052 -0.59450 0.02314 0.61984 1.10386
Random effects:
Groups Name Variance Std.Dev.
dam:sire (Intercept) 0.03179 0.17830
sire (Intercept) 0.05372 0.23178
Residual 0.00229 0.04785
Number of obs: 20, groups: dam:sire, 10; sire, 5
Fixed effects:
Estimate Std. Error t value
(Intercept) 1.3200 0.1185 11.14
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/Mixed%20Models/two_factor_RE.R
Mixed Model via ML
Introduction
The following is based on the Wood text (Generalized Additive Models, 1st ed.) on additive models, chapter 6 in particular. It assumes familiarity with standard regression from a matrix perspective and at least passing familiarity with mixed models. The full document this chapter is based on can be found here, and contains more detail and exposition. See also the comparison of additive and mixed models here.
For this we’ll use the sleepstudy data from the lme4 package. The data has reaction times for 18 individuals over 10 days each (see the help file for the sleepstudy object for more details).
Data Setup
library(tidyverse)
data(sleepstudy, package = 'lme4')
= model.matrix(~Days, sleepstudy)
X = model.matrix(~factor(sleepstudy$Subject) - 1)
Z
colnames(Z) = paste0('Subject_', unique(sleepstudy$Subject)) # for cleaner presentation later
rownames(Z) = paste0('Subject_', sleepstudy$Subject)
= sleepstudy$Reaction y
Function
The following is based on the code in Wood (6.2.2), with a couple modifications for consistent nomenclature and presentation. We use optim and a minimizing function, in this case the negative log likelihood, to estimate the parameters of interest, collectively \(\theta\), in the code below. The (square root of the) variances will be estimated on the log scale. In Wood, he simply extracts the ‘fixed effects’ for the intercept and days effects using lm
(6.2.3), and we’ll do the same.
<- function(y, X, Z, theta) {
mixed_ll = exp(theta[1])
tau = exp(theta[2])
sigma = length(y)
n
# evaluate covariance matrix for y
= tcrossprod(Z)*tau^2 + diag(n)*sigma^2
e = chol(e) # L'L = e
L
# transform dependent linear model to independent
= backsolve(L, y, transpose = TRUE)
y = backsolve(L, X, transpose = TRUE)
X = coef(lm(y~X-1))
b = X %*% b
LP
= -n/2*log(2*pi) -sum(log(diag(L))) - crossprod(y - LP)/2
ll -ll
}
Here is an alternative function using a multivariate normal approach that doesn’t use the transformation to independence, and might provide additional perspective. In this case, y
is just one large multivariate draw with mu mean vector and N x N covariance matrix e.
<- function(y, X, Z, theta) {
mixed_ll_mv = exp(theta[1])
tau = exp(theta[2])
sigma = length(y)
n
# evaluate covariance matrix for y
= tcrossprod(Z)*tau^2 + diag(n)*sigma^2
e = coef(lm.fit(X, y))
b = X %*% b
mu
= mvtnorm::dmvnorm(y, mu, e, log = TRUE)
ll -ll
}
Estimation
Now we’re ready to use the optim function for estimation. A slight change to tolerance is included to get closer estimates to lme4, which we will compare shortly.
= c(0, 0)
param_init names(param_init) = c('tau', 'sigma')
= optim(
fit fn = mixed_ll,
X = X,
y = y,
Z = Z,
par = param_init,
control = list(reltol = 1e-10)
)
= optim(
fit_mv fn = mixed_ll_mv,
X = X,
y = y,
Z = Z,
par = param_init,
control = list(reltol = 1e-10)
)
Comparison
func | tau | sigma | negLogLik | X(Intercept) | XDays |
---|---|---|---|---|---|
mixed_ll | 36.016 | 30.899 | 897.039 | 251.405 | 10.467 |
mixed_ll_mv | 36.016 | 30.899 | 897.039 | 251.405 | 10.467 |
As we can see, both formulations produce identical results. We can now compare those results to the lme4 output for the same model, and see that we’re getting what we should.
library(lme4)
= lmer(Reaction ~ Days + (1 | Subject), sleepstudy, REML = FALSE)
fit_mer
fit_mer
Linear mixed model fit by maximum likelihood ['lmerMod']
Formula: Reaction ~ Days + (1 | Subject)
Data: sleepstudy
AIC BIC logLik deviance df.resid
1802.0786 1814.8505 -897.0393 1794.0786 176
Random effects:
Groups Name Std.Dev.
Subject (Intercept) 36.01
Residual 30.90
Number of obs: 180, groups: Subject, 18
Fixed Effects:
(Intercept) Days
251.41 10.47
We can also predict the random effects (Wood, 6.2.4), sometimes called BLUPs (Best Linear Unbiased Predictions) and after doing so again compare the results to the lme4 estimates.
= exp(fit$par)[1]
tau = tau^2
tausq = exp(fit$par)[2]
sigma = sigma^2
sigmasq = tcrossprod(Z)*tausq/sigmasq + diag(length(y))
Sigma = tausq * t(Z) %*% solve(Sigma) %*% resid(lm(y ~ X - 1))/sigmasq ranef_est
ranef_est | lme4 | |
---|---|---|
Subject_308 | 40.64 | 40.64 |
Subject_309 | -77.57 | -77.57 |
Subject_310 | -62.88 | -62.88 |
Subject_330 | 4.39 | 4.39 |
Subject_331 | 10.18 | 10.18 |
Subject_332 | 8.19 | 8.19 |
Subject_333 | 16.44 | 16.44 |
Subject_334 | -2.99 | -2.99 |
Subject_335 | -45.12 | -45.12 |
Subject_337 | 71.92 | 71.92 |
Subject_349 | -21.12 | -21.12 |
Subject_350 | 14.06 | 14.06 |
Subject_351 | -7.83 | -7.83 |
Subject_352 | 36.25 | 36.25 |
Subject_369 | 7.01 | 7.01 |
Subject_370 | -6.34 | -6.34 |
Subject_371 | -3.28 | -3.28 |
Subject_372 | 18.05 | 18.05 |
Issues with ML estimation
Situations arise in which using maximum likelihood for mixed models would result in notably biased estimates (e.g. small N, lots of fixed effects), and so it is typically not used. Standard software usually defaults to restricted maximum likelihood. However, our purpose here has been served.
Link with penalized regression
A link exists between mixed models and a penalized likelihood approach. For a penalized approach with the the standard linear model, the objective function we want to minimize can be expressed as follows:
\[ \lVert y- X\beta \rVert^2 + \beta^\intercal\beta \]
The added component to the sum of the squared residuals is the penalty. By adding the sum of the squared coefficients, we end up keeping them from getting too big, and this helps to avoid overfitting. Another interesting aspect of this approach is that it is comparable to using a specific prior on the coefficients in a Bayesian framework.
We can now see mixed models as a penalized technique. If we knew \(\sigma\) and \(\psi_\theta\), then the predicted random effects \(g\) and estimates for the fixed effects \(\beta\) are those that minimize the following objective function:
\[ \frac{1}{\sigma^2}\lVert y - X\beta - Zg \rVert^2 + g^\intercal\psi_\theta^{-1}g \]
Source
Main doc found at https://m-clark.github.io/docs/mixedModels/mixedModelML.html