L2 (ridge) regularization

Compare to the lasso chapter. A more conceptual depiction of the both can be found in the penalized ML chapter.

Data Setup

library(tidyverse)

set.seed(8675309)

N = 500
p = 10
X = scale(matrix(rnorm(N * p), ncol = p))
b = c(.5, -.5, .25, -.25, .125, -.125, rep(0, 4))
y = scale(X %*% b + rnorm(N, sd = .5))

Function

ridge <- function(w, X, y, lambda = .1) {
  # X: model matrix; 
  # y: target; 
  # lambda: penalty parameter; 
  # w: the weights/coefficients
  
  crossprod(y - X %*% w) + lambda * length(y) * crossprod(w)
}

Estimation

Note, if lambda = 0, i.e. no penalty, the result is the same as what you would get from the base R lm.fit.

fit_ridge = optim(
  rep(0, ncol(X)),
  ridge,
  X = X,
  y = y,
  lambda = .1,
  method = 'BFGS'
)

Analytical result.

fit_ridge2 = solve(crossprod(X) + diag(length(y)*.1, ncol(X))) %*% crossprod(X, y)

An alternative approach using ‘augmented’ data (note sigma is ignored as it equals 1, but otherwise X/sigma and y/sigma).

X2 = rbind(X, diag(sqrt(length(y)*.1), ncol(X)))
y2 = c(y, rep(0, ncol(X)))

tail(X2)

       [,1] [,2] [,3] [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
[505,]    0    0    0    0 7.071068 0.000000 0.000000 0.000000 0.000000 0.000000
[506,]    0    0    0    0 0.000000 7.071068 0.000000 0.000000 0.000000 0.000000
[507,]    0    0    0    0 0.000000 0.000000 7.071068 0.000000 0.000000 0.000000
[508,]    0    0    0    0 0.000000 0.000000 0.000000 7.071068 0.000000 0.000000
[509,]    0    0    0    0 0.000000 0.000000 0.000000 0.000000 7.071068 0.000000
[510,]    0    0    0    0 0.000000 0.000000 0.000000 0.000000 0.000000 7.071068

tail(y2)

[1] 0 0 0 0 0 0

fit_ridge3 = solve(crossprod(X2)) %*% crossprod(X2, y2)

The glmnet approach is by default a mixture of ridge and lasso penalties, setting alpha = 1 reduces to lasso (alpha = 0 would be ridge). We set the lambda to a couple values while only wanting the one set to the same lambda value as above (s).

library(glmnet)

fit_glmnet = coef(
  glmnet(
    X,
    y,
    alpha = 0,
    lambda = c(10, 1, .1),
    thresh = 1e-12,
    intercept = F
  ), 
  s = .1
)

Comparison

We can now compare the coefficients of all our results.

	lm	ridge	ridge2	ridge3	glmnet	truth
X1	0.535	0.485	0.485	0.485	0.485	0.500
X2	-0.530	-0.481	-0.481	-0.481	-0.481	-0.500
X3	0.234	0.209	0.209	0.209	0.209	0.250
X4	-0.294	-0.269	-0.269	-0.269	-0.269	-0.250
X5	0.126	0.115	0.115	0.115	0.115	0.125
X6	-0.159	-0.146	-0.146	-0.146	-0.146	-0.125
X7	-0.017	-0.022	-0.022	-0.022	-0.022	0.000
X8	0.010	0.007	0.007	0.007	0.007	0.000
X9	-0.005	0.001	0.001	0.001	0.001	0.000
X10	0.011	0.011	0.011	0.011	0.011	0.000

Source

Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/ridge.R