L2 (ridge) regularization

Compare to the lasso chapter. A more conceptual depiction of the both can be found in the penalized ML chapter.

Data Setup

library(tidyverse)

set.seed(8675309)

N = 500
p = 10
X = scale(matrix(rnorm(N * p), ncol = p))
b = c(.5, -.5, .25, -.25, .125, -.125, rep(0, 4))
y = scale(X %*% b + rnorm(N, sd = .5))

Function

ridge <- function(w, X, y, lambda = .1) {
  # X: model matrix; 
  # y: target; 
  # lambda: penalty parameter; 
  # w: the weights/coefficients
  
  crossprod(y - X %*% w) + lambda * length(y) * crossprod(w)
}

Estimation

Note, if lambda = 0, i.e. no penalty, the result is the same as what you would get from the base R lm.fit.

fit_ridge = optim(
  rep(0, ncol(X)),
  ridge,
  X = X,
  y = y,
  lambda = .1,
  method = 'BFGS'
)

Analytical result.

fit_ridge2 = solve(crossprod(X) + diag(length(y)*.1, ncol(X))) %*% crossprod(X, y)

An alternative approach using ‘augmented’ data (note sigma is ignored as it equals 1, but otherwise X/sigma and y/sigma).

X2 = rbind(X, diag(sqrt(length(y)*.1), ncol(X)))
y2 = c(y, rep(0, ncol(X)))

tail(X2)
       [,1] [,2] [,3] [,4]     [,5]     [,6]     [,7]     [,8]     [,9]    [,10]
[505,]    0    0    0    0 7.071068 0.000000 0.000000 0.000000 0.000000 0.000000
[506,]    0    0    0    0 0.000000 7.071068 0.000000 0.000000 0.000000 0.000000
[507,]    0    0    0    0 0.000000 0.000000 7.071068 0.000000 0.000000 0.000000
[508,]    0    0    0    0 0.000000 0.000000 0.000000 7.071068 0.000000 0.000000
[509,]    0    0    0    0 0.000000 0.000000 0.000000 0.000000 7.071068 0.000000
[510,]    0    0    0    0 0.000000 0.000000 0.000000 0.000000 0.000000 7.071068
tail(y2)
[1] 0 0 0 0 0 0
fit_ridge3 = solve(crossprod(X2)) %*% crossprod(X2, y2)

The glmnet approach is by default a mixture of ridge and lasso penalties, setting alpha = 1 reduces to lasso (alpha = 0 would be ridge). We set the lambda to a couple values while only wanting the one set to the same lambda value as above (s).

library(glmnet)

fit_glmnet = coef(
  glmnet(
    X,
    y,
    alpha = 0,
    lambda = c(10, 1, .1),
    thresh = 1e-12,
    intercept = F
  ), 
  s = .1
)

Comparison

We can now compare the coefficients of all our results.

lm ridge ridge2 ridge3 glmnet truth
X1 0.535 0.485 0.485 0.485 0.485 0.500
X2 -0.530 -0.481 -0.481 -0.481 -0.481 -0.500
X3 0.234 0.209 0.209 0.209 0.209 0.250
X4 -0.294 -0.269 -0.269 -0.269 -0.269 -0.250
X5 0.126 0.115 0.115 0.115 0.115 0.125
X6 -0.159 -0.146 -0.146 -0.146 -0.146 -0.125
X7 -0.017 -0.022 -0.022 -0.022 -0.022 0.000
X8 0.010 0.007 0.007 0.007 0.007 0.000
X9 -0.005 0.001 0.001 0.001 0.001 0.000
X10 0.011 0.011 0.011 0.011 0.011 0.000