Newton and IRLS
Here we demonstrate Newton’s and Iterated Reweighted Least Squares approaches with a logistic regression model.
For the following, I had Murphy’s PML text open and more or less followed the algorithms in chapter 8. Note that for Newton’s method, this doesn’t implement a line search to find a more optimal stepsize at a given iteration.
Data Setup
Predict graduate school admission based on gre, gpa, and school rank (higher is more prestige). See corresponding demo here. The only difference is that I treat rank as numeric rather than categorical. We will be comparing results to base R glm function, so I will use it to create the model data.
library(tidyverse)
= haven::read_dta('https://stats.idre.ucla.edu/stat/stata/dae/binary.dta')
admit
= glm(admit ~ gre + gpa + rank, data = admit, family = binomial)
fit_glm
# summary(fit_glm)
= model.matrix(fit_glm)
X = fit_glm$y y
Functions
Newton’s Method
<- function(
newton
X,
y,tol = 1e-12,
iter = 500,
stepsize = .5
) {
# Args:
# X: model matrix
# y: target
# tol: tolerance
# iter: maximum number of iterations
# stepsize: (0, 1)
# intialize
= log(mean(y) / (1 - mean(y))) # intercept
int = c(int, rep(0, ncol(X) - 1))
beta = 1
currtol = 0
it = 0
ll
while (currtol > tol && it < iter) {
= it +1
it = ll
ll_old
= plogis(X %*% beta)[,1]
mu = crossprod(X, mu-y) # gradient
g = diag(mu*(1-mu))
S = t(X) %*% S %*% X # hessian
H = beta - stepsize * solve(H) %*% g
beta
= sum(dbinom(y, prob = mu, size = 1, log = TRUE))
ll = abs(ll - ll_old)
currtol
}
list(
beta = beta,
iter = it,
tol = currtol,
loglik = ll
) }
IRLS
Note that glm is actually using IRLS, so the results from this should be fairly spot on.
<- function(X, y, tol = 1e-12, iter = 500) {
irls
# intialize
= log(mean(y) / (1 - mean(y))) # intercept
int = c(int, rep(0, ncol(X) - 1))
beta = 1
currtol = 0
it = 0
ll
while (currtol > tol && it < iter) {
= it + 1
it = ll
ll_old
= X %*% beta
eta = plogis(eta)[,1]
mu = mu * (1 - mu)
s = diag(s)
S = eta + (y - mu)/s
z = solve(t(X) %*% S %*% X) %*% (t(X) %*% (S %*% z))
beta
= sum(
ll dbinom(
y,prob = plogis(X %*% beta),
size = 1,
log = TRUE
)
)
= abs(ll - ll_old)
currtol
}
list(
beta = beta,
iter = it,
tol = currtol,
loglik = ll,
weights = plogis(X %*% beta) * (1 - plogis(X %*% beta))
) }
Estimation
= newton(
fit_newton X = X,
y = y,
stepsize = .9,
tol = 1e-8 # tol set to 1e-8 as in glm default
)
fit_newton
$beta
[,1]
(Intercept) -3.449548577
gre 0.002293959
gpa 0.777013649
rank -0.560031371
$iter
[1] 8
$tol
[1] 2.581544e-10
$loglik
[1] -229.7209
# fit_glm
tol
set to 1e-8 as in glm default.
= irls(X = X, y = y, tol = 1e-8)
irls_result
str(irls_result)
List of 5
$ beta : num [1:4, 1] -3.44955 0.00229 0.77701 -0.56003
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:4] "(Intercept)" "gre" "gpa" "rank"
.. ..$ : NULL
$ iter : num 4
$ tol : num 6e-09
$ loglik : num -230
$ weights: num [1:400, 1] 0.1536 0.2168 0.2026 0.1268 0.0884 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:400] "1" "2" "3" "4" ...
.. ..$ : NULL
# fit_glm
Comparison
Compare all results.
beta1 | beta2 | beta3 | beta4 | iter | tol | loglik | |
---|---|---|---|---|---|---|---|
newton | -3.45 | 0.002 | 0.777 | -0.56 | 8 | 0 | -229.721 |
irls | -3.45 | 0.002 | 0.777 | -0.56 | 4 | 0 | -229.721 |
glm_default | -3.45 | 0.002 | 0.777 | -0.56 | 4 | NA | -229.721 |
Compare weights between the irls and glm results.
head(cbind(irls_result$weights, fit_glm$weights))
[,1] [,2]
1 0.15362250 0.15362250
2 0.21679615 0.21679615
3 0.20255723 0.20255724
4 0.12676333 0.12676334
5 0.08835918 0.08835918
6 0.23528108 0.23528108
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/newton_irls.R