L2 (ridge) regularization
Compare to the lasso chapter. A more conceptual depiction of the both can be found in the penalized ML chapter.
Data Setup
library(tidyverse)
set.seed(8675309)
= 500
N = 10
p = scale(matrix(rnorm(N * p), ncol = p))
X = c(.5, -.5, .25, -.25, .125, -.125, rep(0, 4))
b = scale(X %*% b + rnorm(N, sd = .5)) y
Function
<- function(w, X, y, lambda = .1) {
ridge # X: model matrix;
# y: target;
# lambda: penalty parameter;
# w: the weights/coefficients
crossprod(y - X %*% w) + lambda * length(y) * crossprod(w)
}
Estimation
Note, if lambda = 0
, i.e. no penalty, the result is the same as what you would get from the base R lm.fit.
= optim(
fit_ridge rep(0, ncol(X)),
ridge,X = X,
y = y,
lambda = .1,
method = 'BFGS'
)
Analytical result.
= solve(crossprod(X) + diag(length(y)*.1, ncol(X))) %*% crossprod(X, y) fit_ridge2
An alternative approach using ‘augmented’ data (note sigma
is ignored as it equals 1, but otherwise
X/sigma and y/sigma).
= rbind(X, diag(sqrt(length(y)*.1), ncol(X)))
X2 = c(y, rep(0, ncol(X)))
y2
tail(X2)
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[505,] 0 0 0 0 7.071068 0.000000 0.000000 0.000000 0.000000 0.000000
[506,] 0 0 0 0 0.000000 7.071068 0.000000 0.000000 0.000000 0.000000
[507,] 0 0 0 0 0.000000 0.000000 7.071068 0.000000 0.000000 0.000000
[508,] 0 0 0 0 0.000000 0.000000 0.000000 7.071068 0.000000 0.000000
[509,] 0 0 0 0 0.000000 0.000000 0.000000 0.000000 7.071068 0.000000
[510,] 0 0 0 0 0.000000 0.000000 0.000000 0.000000 0.000000 7.071068
tail(y2)
[1] 0 0 0 0 0 0
= solve(crossprod(X2)) %*% crossprod(X2, y2) fit_ridge3
The glmnet approach is by default a mixture
of ridge and lasso penalties, setting alpha = 1
reduces to lasso (alpha = 0
would be ridge). We set the lambda to a couple values while only wanting the one
set to the same lambda value as above (s
).
library(glmnet)
= coef(
fit_glmnet glmnet(
X,
y,alpha = 0,
lambda = c(10, 1, .1),
thresh = 1e-12,
intercept = F
), s = .1
)
Comparison
We can now compare the coefficients of all our results.
lm | ridge | ridge2 | ridge3 | glmnet | truth | |
---|---|---|---|---|---|---|
X1 | 0.535 | 0.485 | 0.485 | 0.485 | 0.485 | 0.500 |
X2 | -0.530 | -0.481 | -0.481 | -0.481 | -0.481 | -0.500 |
X3 | 0.234 | 0.209 | 0.209 | 0.209 | 0.209 | 0.250 |
X4 | -0.294 | -0.269 | -0.269 | -0.269 | -0.269 | -0.250 |
X5 | 0.126 | 0.115 | 0.115 | 0.115 | 0.115 | 0.125 |
X6 | -0.159 | -0.146 | -0.146 | -0.146 | -0.146 | -0.125 |
X7 | -0.017 | -0.022 | -0.022 | -0.022 | -0.022 | 0.000 |
X8 | 0.010 | 0.007 | 0.007 | 0.007 | 0.007 | 0.000 |
X9 | -0.005 | 0.001 | 0.001 | 0.001 | 0.001 | 0.000 |
X10 | 0.011 | 0.011 | 0.011 | 0.011 | 0.011 | 0.000 |
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/ridge.R