Cubic Spline Model
See Wood (2017) Generalized Additive Models or my document for an introduction to generalized additive models.
Data Setup
The data regards engine wear index versus engine capacity for 19 Volvo car engines used. The idea is that a larger car engine will wear out less quickly than a smaller one (from Wood GAM 2e chapter 4).
library(tidyverse)
data(engine, package = 'gamair')
size = engine$size
wear = engine$wear
x = size - min(size)
x = x / max(x)
d = data.frame(wear, x)Functions
Cubic spline function, rk refers to reproducing kernel. If I recall correctly, the function code is actually based on the first edition of Wood’s text.
rk <- function(x, z) {
((z - 0.5)^2 - 1/12) * ((x - 0.5)^2 - 1/12)/4 -
((abs(x - z) - 0.5)^4 - (abs(x - z) - 0.5)^2 / 2 + 7/240) / 24
}Generate the model matrix.
splX <- function(x, knots) {
q = length(knots) + 2 # number of parameters
n = length(x) # number of observations
X = matrix(1, n, q) # initialized model matrix
X[ ,2] = x # set second column to x
X[ ,3:q] = outer(x, knots, FUN = rk) # remaining to cubic spline basis
X
}
splS <- function(knots) {
q = length(knots) + 2
S = matrix(0, q, q) # initialize matrix
S[3:q, 3:q] = outer(knots, knots, FUN = rk) # fill in non-zero part
S
}Matrix square root function. Note that there are various packages with their own.
mat_sqrt <- function(S) {
d = eigen(S, symmetric = TRUE)
rS = d$vectors %*% diag(d$values^.5) %*% t(d$vectors)
rS
}Penalized fitting function.
prs_fit <- function(y, x, knots, lambda) {
q = length(knots) + 2 # dimension of basis
n = length(x) # number of observations
Xa = rbind(splX(x, knots), mat_sqrt(splS(knots))*sqrt(lambda)) # augmented model matrix
y[(n + 1):(n+q)] = 0 # augment the data vector
lm(y ~ Xa - 1) # fit and return penalized regression spline
}Example 1
We start with an unpenalized approach.
knots = 1:4/5
X = splX(x, knots) # generate model matrix
fit_lm = lm(wear ~ X - 1) # fit model
xp = 0:100/100 # x values for prediction
Xp = splX(xp, knots) # prediction matrixVisualize.
ggplot(aes(x = x, y = wear), data = data.frame(x, wear)) +
geom_point(color = "#FF5500") +
geom_line(aes(x = xp, y = Xp %*% coef(fit_lm)),
data = data.frame(xp, Xp),
color = "#00AAFF") +
labs(x = 'Scaled Engine size', y = 'Wear Index')Example 2
Now we add the lambda penalty and compare fits at different values of lambda.
knots = 1:7/8
d2 = data.frame(x = xp)
lambda = c(.1, .01, .001, .0001, .00001, .000001)
rmse = vector('numeric', length(lambda))
idx = 0
for (i in lambda) {
# fit penalized regression
fit_penalized = prs_fit(
y = wear,
x = x,
knots = knots,
lambda = i
)
# spline choosing lambda
Xp = splX(xp, knots) # matrix to map parameters to fitted values at xp
LP = Xp %*% coef(fit_penalized)
d2[, paste0('lambda = ', i)] = LP[, 1]
r = resid(fit_penalized)
idx = 1 + idx
rmse[idx] = sqrt(mean(r^2))
}Visualize. I add the root mean square error for model comparison.
d3 = d2 %>%
pivot_longer(cols = -x,
names_to = 'lambda',
values_to = 'value') %>%
mutate(lambda = fct_inorder(lambda),
rmse = round(rmse[lambda], 3))Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/cubicsplines.R