L1 (lasso) regularization
See Tibshirani (1996) for the original paper, or Murphy PML (2012) for a nice overview (watch for typos in depictions). A more conceptual depiction of the lasso can be found in the penalized ML chapter.
Data Setup
library(tidyverse)
set.seed(8675309)
N = 500
p = 10
X = scale(matrix(rnorm(N*p), ncol = p))
b = c(.5, -.5, .25, -.25, .125, -.125, rep(0, p - 6))
y = scale(X %*% b + rnorm(N, sd = .5))
lambda = .1Function
Coordinate descent function.
lasso <- function(
X, # model matrix
y, # target
lambda = .1, # penalty parameter
soft = TRUE, # soft vs. hard thresholding
tol = 1e-6, # tolerance
iter = 100, # number of max iterations
verbose = TRUE # print out iteration number
) {
# soft thresholding function
soft_thresh <- function(a, b) {
out = rep(0, length(a))
out[a > b] = a[a > b] - b
out[a < -b] = a[a < -b] + b
out
}
w = solve(crossprod(X) + diag(lambda, ncol(X))) %*% crossprod(X,y)
tol_curr = 1
J = ncol(X)
a = rep(0, J)
c_ = rep(0, J)
i = 1
while (tol < tol_curr && i < iter) {
w_old = w
a = colSums(X^2)
l = length(y)*lambda # for consistency with glmnet approach
c_ = sapply(1:J, function(j) sum( X[,j] * (y - X[,-j] %*% w_old[-j]) ))
if (soft) {
for (j in 1:J) {
w[j] = soft_thresh(c_[j]/a[j], l/a[j])
}
}
else {
w = w_old
w[c_< l & c_ > -l] = 0
}
tol_curr = crossprod(w - w_old)
i = i + 1
if (verbose && i%%10 == 0) message(i)
}
w
}Estimation
Note, if lambda = 0, i.e. no penalty, the result is the same as what you would get from the base R lm.fit.
fit_soft = lasso(
X,
y,
lambda = lambda,
tol = 1e-12,
soft = TRUE
)
fit_hard = lasso(
X,
y,
lambda = lambda,
tol = 1e-12,
soft = FALSE
)The glmnet approach is by default a mixture
of ridge and lasso penalties, setting alpha = 1 reduces to lasso (alpha = 0
would be ridge). We set the lambda to a couple values while only wanting the one
set to the same lambda value as above (s).
library(glmnet)
fit_glmnet = coef(
glmnet(
X,
y,
alpha = 1,
lambda = c(10, 1, lambda),
thresh = 1e-12,
intercept = FALSE
),
s = lambda
)
library(lassoshooting)
fit_ls = lassoshooting(
X = X,
y = y,
lambda = length(y) * lambda,
thr = 1e-12
)Comparison
We can now compare the coefficients of all our results.
| lm | lasso_soft | lasso_hard | lspack | glmnet | truth | |
|---|---|---|---|---|---|---|
| X1 | 0.535 | 0.435 | 0.535 | 0.435 | 0.436 | 0.500 |
| X2 | -0.530 | -0.429 | -0.530 | -0.429 | -0.429 | -0.500 |
| X3 | 0.234 | 0.124 | 0.234 | 0.124 | 0.124 | 0.250 |
| X4 | -0.294 | -0.207 | -0.294 | -0.207 | -0.208 | -0.250 |
| X5 | 0.126 | 0.020 | 0.126 | 0.020 | 0.020 | 0.125 |
| X6 | -0.159 | -0.055 | -0.159 | -0.055 | -0.055 | -0.125 |
| X7 | -0.017 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| X8 | 0.010 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| X9 | -0.005 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
| X10 | 0.011 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 |
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/lasso.R