L1 (lasso) regularization

See Tibshirani (1996) for the original paper, or Murphy PML (2012) for a nice overview (watch for typos in depictions). A more conceptual depiction of the lasso can be found in the penalized ML chapter.

Data Setup

library(tidyverse)

set.seed(8675309)

N = 500
p = 10
X = scale(matrix(rnorm(N*p), ncol = p))
b = c(.5, -.5, .25, -.25, .125, -.125, rep(0, p - 6))
y = scale(X %*% b + rnorm(N, sd = .5))
lambda = .1

Function

Coordinate descent function.

lasso <- function(
  X,                   # model matrix
  y,                   # target
  lambda  = .1,        # penalty parameter
  soft    = TRUE,      # soft vs. hard thresholding
  tol     = 1e-6,      # tolerance
  iter    = 100,       # number of max iterations
  verbose = TRUE       # print out iteration number
) {
  
  # soft thresholding function
  soft_thresh <- function(a, b) {
    out = rep(0, length(a))
    out[a >  b] = a[a > b] - b
    out[a < -b] = a[a < -b] + b
    out
  }
  
  w  = solve(crossprod(X) + diag(lambda, ncol(X))) %*% crossprod(X,y)
  tol_curr = 1
  J  = ncol(X)
  a  = rep(0, J)
  c_ = rep(0, J)
  i  = 1
  
  while (tol < tol_curr && i < iter) {
    w_old = w 
    a  = colSums(X^2)
    l  = length(y)*lambda  # for consistency with glmnet approach
    c_ = sapply(1:J, function(j)  sum( X[,j] * (y - X[,-j] %*% w_old[-j]) ))
    
    if (soft) {
      for (j in 1:J) {
        w[j] = soft_thresh(c_[j]/a[j], l/a[j])
      }
    }
    else {
      w = w_old
      w[c_< l & c_ > -l] = 0
    }
    
    tol_curr = crossprod(w - w_old)  
    
    i = i + 1
    
    if (verbose && i%%10 == 0) message(i)
  }
  
  w
}

Estimation

Note, if lambda = 0, i.e. no penalty, the result is the same as what you would get from the base R lm.fit.

fit_soft = lasso(
  X,
  y,
  lambda = lambda,
  tol    = 1e-12,
  soft   = TRUE
)

fit_hard = lasso(
  X,
  y,
  lambda = lambda,
  tol    = 1e-12,
  soft   = FALSE
)

The glmnet approach is by default a mixture of ridge and lasso penalties, setting alpha = 1 reduces to lasso (alpha = 0 would be ridge). We set the lambda to a couple values while only wanting the one set to the same lambda value as above (s).

library(glmnet)

fit_glmnet = coef(
  glmnet(
    X,
    y,
    alpha  = 1,
    lambda = c(10, 1, lambda),
    thresh = 1e-12,
    intercept = FALSE
  ),
  s = lambda
)

library(lassoshooting)

fit_ls = lassoshooting(
  X = X,
  y = y,
  lambda = length(y) * lambda,
  thr = 1e-12
)

Comparison

We can now compare the coefficients of all our results.

	lm	lasso_soft	lasso_hard	lspack	glmnet	truth
X1	0.535	0.435	0.535	0.435	0.436	0.500
X2	-0.530	-0.429	-0.530	-0.429	-0.429	-0.500
X3	0.234	0.124	0.234	0.124	0.124	0.250
X4	-0.294	-0.207	-0.294	-0.207	-0.208	-0.250
X5	0.126	0.020	0.126	0.020	0.020	0.125
X6	-0.159	-0.055	-0.159	-0.055	-0.055	-0.125
X7	-0.017	0.000	0.000	0.000	0.000	0.000
X8	0.010	0.000	0.000	0.000	0.000	0.000
X9	-0.005	0.000	0.000	0.000	0.000	0.000
X10	0.011	0.000	0.000	0.000	0.000	0.000

Source

Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/lasso.R