Cubic Spline Model
See Wood (2017) Generalized Additive Models or my document for an introduction to generalized additive models.
Data Setup
The data regards engine wear index versus engine capacity for 19 Volvo car engines used. The idea is that a larger car engine will wear out less quickly than a smaller one (from Wood GAM 2e chapter 4).
library(tidyverse)
data(engine, package = 'gamair')
= engine$size
size = engine$wear
wear
= size - min(size)
x = x / max(x)
x = data.frame(wear, x) d
Functions
Cubic spline function, rk
refers to reproducing kernel. If I recall correctly, the function code is actually based on the first edition of Wood’s text.
<- function(x, z) {
rk - 0.5)^2 - 1/12) * ((x - 0.5)^2 - 1/12)/4 -
((z abs(x - z) - 0.5)^4 - (abs(x - z) - 0.5)^2 / 2 + 7/240) / 24
(( }
Generate the model matrix.
<- function(x, knots) {
splX = length(knots) + 2 # number of parameters
q = length(x) # number of observations
n = matrix(1, n, q) # initialized model matrix
X 2] = x # set second column to x
X[ ,3:q] = outer(x, knots, FUN = rk) # remaining to cubic spline basis
X[ ,
X
}
<- function(knots) {
splS = length(knots) + 2
q = matrix(0, q, q) # initialize matrix
S 3:q, 3:q] = outer(knots, knots, FUN = rk) # fill in non-zero part
S[
S }
Matrix square root function. Note that there are various packages with their own.
<- function(S) {
mat_sqrt = eigen(S, symmetric = TRUE)
d = d$vectors %*% diag(d$values^.5) %*% t(d$vectors)
rS
rS }
Penalized fitting function.
<- function(y, x, knots, lambda) {
prs_fit = length(knots) + 2 # dimension of basis
q = length(x) # number of observations
n = rbind(splX(x, knots), mat_sqrt(splS(knots))*sqrt(lambda)) # augmented model matrix
Xa + 1):(n+q)] = 0 # augment the data vector
y[(n
lm(y ~ Xa - 1) # fit and return penalized regression spline
}
Example 1
We start with an unpenalized approach.
= 1:4/5
knots = splX(x, knots) # generate model matrix
X
= lm(wear ~ X - 1) # fit model
fit_lm
= 0:100/100 # x values for prediction
xp = splX(xp, knots) # prediction matrix Xp
Visualize.
ggplot(aes(x = x, y = wear), data = data.frame(x, wear)) +
geom_point(color = "#FF5500") +
geom_line(aes(x = xp, y = Xp %*% coef(fit_lm)),
data = data.frame(xp, Xp),
color = "#00AAFF") +
labs(x = 'Scaled Engine size', y = 'Wear Index')
Example 2
Now we add the lambda
penalty and compare fits at different values of lambda
.
= 1:7/8
knots = data.frame(x = xp)
d2 = c(.1, .01, .001, .0001, .00001, .000001)
lambda = vector('numeric', length(lambda))
rmse = 0
idx
for (i in lambda) {
# fit penalized regression
= prs_fit(
fit_penalized y = wear,
x = x,
knots = knots,
lambda = i
) # spline choosing lambda
= splX(xp, knots) # matrix to map parameters to fitted values at xp
Xp = Xp %*% coef(fit_penalized)
LP paste0('lambda = ', i)] = LP[, 1]
d2[,
= resid(fit_penalized)
r = 1 + idx
idx
= sqrt(mean(r^2))
rmse[idx] }
Visualize. I add the root mean square error for model comparison.
= d2 %>%
d3 pivot_longer(cols = -x,
names_to = 'lambda',
values_to = 'value') %>%
mutate(lambda = fct_inorder(lambda),
rmse = round(rmse[lambda], 3))
Source
Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/cubicsplines.R