Reproducing Kernel Hilbert Space Regression
This R code is based on Reproducing Kernel Hilbert Spaces for Penalized Regression: A tutorial, Nosedal-Sanchez et al. (2010), specifically, their code in the supplemental section. The original code had several issues as far as general R programming practices, and eventually appears to have been replaced in publication at some point, as did most of the corresponding supplemental text. I can no longer locate the original, so now follow the published code. The original data I was following was also replaced by the longley and mcycle data sets.
To start, we will use an example for ridge regression, followed by a spline example.
Data Setup
library(tidyverse)
data(longley) # avaiable in base R
= longley[,1]
y = as.matrix(longley[,2:7])
X = apply(X, 2, scales::rescale, to = c(0, 1)) X
Functions
Function to find the inverse of a matrix
We can use base::solve
, but this function avoids a computationally singular result.
<- function(X, eps = 1e-12) {
inverse = eigen(X, symmetric = TRUE)
eig.X = eig.X[[2]]
P = eig.X[[1]]
lambda
= lambda > eps
ind
= 1/lambda[ind]
lambda[ind] !ind] = 0
lambda[
%*% diag(lambda) %*% t(P)
P }
Reproducing Kernel
<- function(s, t) {
rk = length(s)
init_len = 0
rk
for (i in 1:init_len)
= s[i]*t[i] + rk
rk
rk }
Gram matrix
For the first example involving ridge regression, the gram function just produces tcrossprod(X)
. I generalize it in case a different kernel is desired, and add that as an additional argument. This will avoid having to redo the function later.
<- function(X, rkfunc = rk) {
gram apply(X, 1, function(Row)
apply(X, 1, function(tRow) rkfunc(Row, tRow))
) }
Ridge regression
<- function(X, y, lambda) {
ridge = gram(X) # Gramm matrix (nxn)
Gramm
= length(y)
n = cbind(1, Gramm) # design matrix
Q = rbind(0, cbind(0, Gramm))
S
= crossprod(Q) + lambda*S
M = inverse(M) # inverse of M
M_inv
= crossprod(M_inv, crossprod(Q, y))
gamma_hat = Q %*% gamma_hat
f_hat
= Q %*% M_inv %*% t(Q)
A = sum(diag(A)) # trace of hat matrix
tr_A
= crossprod(y - f_hat) # residual sum of squares
rss = n*rss / (n - tr_A)^2 # obtain GCV score
gcv
list(
f_hat = f_hat,
gamma_hat = gamma_hat,
beta_hat = c(gamma_hat[1], crossprod(gamma_hat[-1], X)),
gcv = gcv
) }
Estimation
A simple direct search for the GCV optimal smoothing parameter can be made as follows:
= 10^seq(-6, 0, by = .1)
lambda
= map(lambda, function(lam) ridge(X, y, lam))
gcv_search
= map_dbl(gcv_search, function(x) x$gcv)
V
= map_df(
ridge_coefs
gcv_search, function(x)
data.frame(
value = x$beta_hat[-1],
coef = colnames(X)
), .id = 'iter'
%>%
) mutate(lambda = lambda[as.integer(iter)])
Compare with Figure 3 in the article.
= qplot(
gcv_plot
lambda,
V,geom = 'line',
main = 'GCV score',
ylab = 'GCV'
+
) scale_x_log10()
= ridge_coefs %>%
beta_plot ggplot(aes(x = lambda, y = value, color = coef)) +
geom_line() +
scale_x_log10() +
::scale_color_scico_d(end = .8) +
scicolabs(title = 'Betas Across Lambda')
Pick the best model and obtain the estimates.
= ridge(X, y, lambda[which.min(V)]) # fit optimal model
fit_ridge
= fit_ridge$gamma_hat
gamma_hat = fit_ridge$gamma_hat[1] # intercept
beta_0 = crossprod(gamma_hat[-1,], X) # slope and noise term coefficients beta_hat
Comparison
I add a comparison to glmnet, where setting alpha = 0
is equivalent to ridge regression.
c(beta_0, beta_hat)
[1] 82.7840043 54.1683427 5.3640251 1.3781910 -28.7948627 5.3956341 -0.6095799
= glmnet::glmnet(
fit_glmnet
X,
y,alpha = 0,
lambda = lambda,
standardize = FALSE
)
(Intercept) | GNP | Unemployed | Armed.Forces | Population | Year | Employed | |
---|---|---|---|---|---|---|---|
fit_ridge | 82.784 | 54.168 | 5.364 | 1.378 | -28.795 | 5.396 | -0.610 |
fit_glmnet | 82.329 | 69.783 | 6.963 | 1.245 | -37.323 | -1.497 | -1.606 |
Example: Cubic Spline
Data Setup
For this example we’ll use the MASS::mcycle
data.
= as.matrix(MASS::mcycle$times)
x = scales::rescale(x, to = c(0, 1)) # rescale predictor to [0,1]
x = MASS::mcycle$accel y
Functions
Reproducing Kernel
<- function(s, t) {
rk_spline return(.5 * min(s, t)^2 * max(s, t) - (1/6) * min(s, t)^3)
}
No need to redo the gram function do to previous change that accepts the kernel as an argument
Smoothing Spline
<- function(X, y, lambda) {
smoothing_spline = gram(X, rkfunc = rk_spline) # Gramm matrix (nxn)
Gramm
= length(y)
n = cbind(1, X) # matrix with a basis for the null space of the penalty
J = cbind(J, Gramm) # design matrix
Q = ncol(J) # dimension of the null space of the penalty
m
= matrix(0, n + m, n + m) # initialize S
S + 1):(n + m), (m + 1):(n + m)] = Gramm # non-zero part of S
S[(m
= crossprod(Q) + lambda*S
M = inverse(M) # inverse of M
M_inv
= crossprod(M_inv, crossprod(Q, y))
gamma_hat = Q %*% gamma_hat
f_hat
= Q %*% M_inv %*% t(Q)
A = sum(diag(A)) # trace of hat matrix
tr_A
= crossprod(y - f_hat) # residual sum of squares
rss = n * rss/(n - tr_A)^2 # obtain GCV score
gcv
list(
f_hat = f_hat,
gamma_hat = gamma_hat,
gcv = gcv
) }
Estimation
We’ll now get fits for multiple lambda values.
= 10^seq(-6, 0, by = .1)
lambda
= map(lambda, function(lam) smoothing_spline(x, y, lam))
gcv_search
= map_dbl(gcv_search, function(x) x$gcv) V
Plot of GCV across different lambdas.
Comparison
I’ve added comparison to an additive model using mgcv. Compare the result to Figure 2 of the Supplementary Material.
= smoothing_spline(x, y, lambda[which.min(V)]) # fit optimal model
fit_rk = mgcv::gam(y ~ s(x)) fit_gam
Source
Original code available at: https://github.com/m-clark/Miscellaneous-R-Code/tree/master/ModelFitting/RKHSReg
Current Supplemental Code
You can peruse the supplemental section that shows the R code here.
Original Supplemental Code
This was the old original code from the supplemental section for the article, but was changed at some point (I can’t remember, it may have been at one of the author’s website). The R code on the repo follows these examples, while this document follows the currently accessible supplementary material. I used RStudio’s default cleanup to make the code a little easier to read, and maybe added a little spacing, but otherwise it is identical to what’s in the linked paper.
A.1
###### Data ########
set.seed(3)
<- 20
n <- runif(n)
x1 <- runif(n)
x2 <- matrix(c(x1, x2), ncol = 2) # design matrix
X <- 2 + 3 * x1 + rnorm(n, sd = 0.25)
y
##### function to find the inverse of a matrix ####
<- function(X, eps = 1e-12) {
my.inv <- eigen(X, symmetric = T)
eig.X <- eig.X[[2]]
P <- eig.X[[1]]
lambda <- lambda > eps
ind <- 1 / lambda[ind]
lambda[ind] !ind] <- 0
lambda[<- P %*% diag(lambda, nrow = length(lambda)) %*% t(P)
ans return(ans)
}
###### Reproducing Kernel #########
<- function(s, t) {
rk <- length(s)
p <- 0
rk for (i in 1:p) {
<- s[i] * t[i] + rk
rk
}return((rk))
}
##### Gram matrix #######
<- function(X) {
get.gramm <- dim(X)[1]
n <-
Gramm matrix(0, n, n) #initializes Gramm array #i=index for rows
#j=index for columns Gramm<-as.matrix(Gramm) # Gramm matrix
for (i in 1:n) {
for (j in 1:n) {
<- rk(X[i,], X[j,])
Gramm[i, j]
}
} return(Gramm)
}
<- function(X, y, lambda) {
ridge.regression <- get.gramm(X) #Gramm matrix (nxn)
Gramm <- dim(X)[1] # n=length of y
n <- matrix(1, n, 1) # vector of ones dim
J <- cbind(J, Gramm) # design matrix
Q <- 1 # dimension of the null space of the penalty
m <- matrix(0, n + m, n + m) #initialize S
S + 1):(n + m), (m + 1):(n + m)] <- Gramm #non-zero part of S
S[(m <- (t(Q) %*% Q + lambda * S)
M <- my.inv(M) # inverse of M
M.inv <- crossprod(M.inv, crossprod(Q, y))
gamma.hat <- Q %*% gamma.hat
f.hat <- Q %*% M.inv %*% t(Q)
A <- sum(diag(A)) #trace of hat matrix
tr.A <- t(y - f.hat) %*% (y - f.hat) #residual sum of squares
rss <- n * rss / (n - tr.A) ^ 2 #obtain GCV score
gcv return(list(
f.hat = f.hat,
gamma.hat = gamma.hat,
gcv = gcv
))
}
# Plot of GCV
<- 1e-8
lambda <- rep(0, 40)
V for (i in 1:40) {
<- ridge.regression(X, y, lambda)$gcv #obtain GCV score
V[i] <- lambda * 1.5 #increase lambda
lambda
}
<- (1:40)
index plot(
1.5 ^ (index - 1) * 1e-8,
V,type = "l",
main = "GCV
score",
lwd = 2,
xlab = "lambda",
ylab = "GCV"
# plot score
)
<- (1:60)[V == min(V)] # extract index of min(V)
i <- ridge.regression(X, y, 1.5 ^ (i - 1) * 1e-8) #fit optimal model
opt.mod
### finding beta.0, beta.1 and beta.2 ##########
<- opt.mod$gamma.hat
gamma.hat .0 <- opt.mod$gamma.hat[1]#intercept
beta.hat<- gamma.hat[2:21,] %*% X #slope and noise term coefficients
beta.hat
#### Fitted Line Plot for Cubic Smoothing Spline ####
plot(x[,1],y,xlab="x",ylab="response",main="Cubic Smoothing Spline") ;
lines(x[,1],opt.mod$f.hat,type="l",lty=1,lwd=2,col="blue") ;
A.2
A.2 RKHS solution applied to Cubic Smoothing Spline
We consider a sample of size n = 50, (\(y_1, y_2, y_3, ..., y_{50}\)), from the model \(y_i = sin(2πx_i) + ϵ_i\) where ϵi has a N(0, 0.22) . The following code generates x and y…
A simple direct search for the GCV optimal smoothing parameter can be made as follows: Now we have to find an optimal lambda using GCV…
Below we give a function to find the cubic smoothing spline using the RKHS framework we discussed in Section 4.3. We also provide a graph with our estimation along with the true function and data.
###### Data ########
set.seed(3)
<- 50
n <- matrix(runif(n), nrow, ncol = 1)
x <- matrix(sort(x), nrow, ncol = 1) # sorted x, used by plot
x.star <- sin(2 * pi * x.star) + rnorm(n, sd = 0.2)
y
#### Reproducing Kernel for <f,g>=int_0^1 f’’(x)g’’(x)dx #####
.1 <- function(s, t) {
rkreturn((1 / 2) * min(s, t) ^ 2) * (max(s, t) + (1 / 6) * (min(s, t)) ^ 3)
}
.1 <- function(X) {
get.gramm<- dim(X)[1]
n <- matrix(0, n, n) #initializes Gramm array
Gramm #i=index for rows
#j=index for columns
<- as.matrix(Gramm) # Gramm matrix
Gramm for (i in 1:n) {
for (j in 1:n) {
<- rk.1(X[i, ], X[j, ])
Gramm[i, j]
}
}return(Gramm)
}
<- function(X, y, lambda) {
smoothing.spline <- get.gramm.1(X) #Gramm matrix (nxn)
Gramm <- dim(X)[1] # n=length of y
n <- matrix(1, n, 1) # vector of ones dim
J <- cbind(J, X) # matrix with a basis for the null space of the penalty
T <- cbind(T, Gramm) # design matrix
Q <- dim(T)[2] # dimension of the null space of the penalty
m <- matrix(0, n + m, n + m) #initialize S
S + 1):(n + m), (m + 1):(n + m)] <- Gramm #non-zero part of S
S[(m <- (t(Q) %*% Q + lambda * S)
M <- my.inv(M) # inverse of M
M.inv <- crossprod(M.inv, crossprod(Q, y))
gamma.hat <- Q %*% gamma.hat
f.hat <- Q %*% M.inv %*% t(Q)
A <- sum(diag(A)) #trace of hat matrix
tr.A <- t(y - f.hat) %*% (y - f.hat) #residual sum of squares
rss <- n * rss / (n - tr.A) ^ 2 #obtain GCV score
gcv return(list(
f.hat = f.hat,
gamma.hat = gamma.hat,
gcv = gcv
))
}
### Now we have to find an optimal lambda using GCV...
### Plot of GCV
<- 1e-8 V <- rep(0, 60) for (i in 1:60) {
lambda <- smoothing.spline(x.star, y, lambda)$gcv #obtain GCV score
V[i] <- lambda * 1.5 #increase lambda
lambda
} plot(1:60,
V,type = "l",
main = "GCV score",
xlab = "i") # plot score
<- (1:60)[V == min(V)] # extract index of min(V)
i <- smoothing.spline(x.star, y, 1.5 ^ (i - 1) * 1e-8) #fit optimal
fit_rk
model
#Graph (Cubic Spline)
plot(
x.star,$f.hat,
fit_rktype = "l",
lty = 2,
lwd = 2,
col = "blue",
xlab = "x",
ylim = c(-2.5, 1.5),
xlim = c(-0.1, 1.1),
ylab = "response",
main = "Cubic Spline"
)#predictions
lines(x.star, sin(2 * pi * x.star), lty = 1, lwd = 2) #true
legend(
-0.1,
-1.5,
c("predictions", "true"),
lty = c(2, 1),
bty = "n",
lwd = c(2, 2),
col = c("blue", "black")
) points(x.star, y)