L1 (lasso) regularization
See Tibshirani (1996) for the original paper, or Murphy PML (2012) for a nice overview (watch for typos in depictions). A more conceptual depiction of the lasso can be found in the penalized ML chapter.
Data Setup
library(tidyverse)
set.seed(8675309)
= 500
N = 10
p = scale(matrix(rnorm(N*p), ncol = p))
X = c(.5, -.5, .25, -.25, .125, -.125, rep(0, p - 6))
b = scale(X %*% b + rnorm(N, sd = .5))
y = .1 lambda
Function
Coordinate descent function.
<- function(
lasso # model matrix
X, # target
y, lambda = .1, # penalty parameter
soft = TRUE, # soft vs. hard thresholding
tol = 1e-6, # tolerance
iter = 100, # number of max iterations
verbose = TRUE # print out iteration number
) {
# soft thresholding function
<- function(a, b) {
soft_thresh = rep(0, length(a))
out > b] = a[a > b] - b
out[a < -b] = a[a < -b] + b
out[a
out
}
= solve(crossprod(X) + diag(lambda, ncol(X))) %*% crossprod(X,y)
w = 1
tol_curr = ncol(X)
J = rep(0, J)
a = rep(0, J)
c_ = 1
i
while (tol < tol_curr && i < iter) {
= w
w_old = colSums(X^2)
a = length(y)*lambda # for consistency with glmnet approach
l = sapply(1:J, function(j) sum( X[,j] * (y - X[,-j] %*% w_old[-j]) ))
c_
if (soft) {
for (j in 1:J) {
= soft_thresh(c_[j]/a[j], l/a[j])
w[j]
}
}else {
= w_old
w < l & c_ > -l] = 0
w[c_
}
= crossprod(w - w_old)
tol_curr
= i + 1
i
if (verbose && i%%10 == 0) message(i)
}
w }
Estimation
Note, if lambda = 0
, i.e. no penalty, the result is the same as what you would get from the base R lm.fit.
= lasso(
fit_soft
X,
y,lambda = lambda,
tol = 1e-12,
soft = TRUE
)
= lasso(
fit_hard
X,
y,lambda = lambda,
tol = 1e-12,
soft = FALSE
)
The glmnet approach is by default a mixture
of ridge and lasso penalties, setting alpha = 1
reduces to lasso (alpha = 0
would be ridge). We set the lambda to a couple values while only wanting the one
set to the same lambda value as above (s
).
library(glmnet)
= coef(
fit_glmnet glmnet(
X,
y,alpha = 1,
lambda = c(10, 1, lambda),
thresh = 1e-12,
intercept = FALSE
),s = lambda
)
library(lassoshooting)
= lassoshooting(
fit_ls X = X,
y = y,
lambda = length(y) * lambda,
thr = 1e-12
)
Comparison
We can now compare the coefficients of all our results.
lm | lasso_soft | lasso_hard | lspack | glmnet | truth | |
---|---|---|---|---|---|---|
X1 | 0.535 | 0.435 | 0.535 | 0.435 | 0.436 | 0.500 |
X2 | -0.530 | -0.429 | -0.530 | -0.429 | -0.429 | -0.500 |
X3 | 0.234 | 0.124 | 0.234 | 0.124 | 0.124 | 0.250 |
X4 | -0.294 | -0.207 | -0.294 | -0.207 | -0.208 | -0.250 |
X5 | 0.126 | 0.020 | 0.126 | 0.020 | 0.020 | 0.125 |
X6 | -0.159 | -0.055 | -0.159 | -0.055 | -0.055 | -0.125 |
X7 | -0.017 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
X8 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
X9 | -0.005 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
X10 | 0.011 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/lasso.R