# Cubic Spline Model

See Wood (2017) Generalized Additive Models or my document for an introduction to generalized additive models.

## Data Setup

The data regards engine wear index versus engine capacity for 19 Volvo car engines used. The idea is that a larger car engine will wear out less quickly than a smaller one (from Wood GAM 2e chapter 4).

library(tidyverse)

data(engine, package = 'gamair')

size = engine$size wear = engine$wear

x = size - min(size)
x = x / max(x)
d = data.frame(wear, x)

## Functions

Cubic spline function, rk refers to reproducing kernel. If I recall correctly, the function code is actually based on the first edition of Wood’s text.

rk <- function(x, z) {
((z - 0.5)^2 - 1/12) * ((x - 0.5)^2 - 1/12)/4 -
((abs(x - z) - 0.5)^4 - (abs(x - z) - 0.5)^2 / 2 + 7/240) / 24
}

Generate the model matrix.

splX <- function(x, knots) {
q = length(knots) + 2                # number of parameters
n = length(x)                        # number of observations
X = matrix(1, n, q)                  # initialized model matrix
X[ ,2]   = x                         # set second column to x
X[ ,3:q] = outer(x, knots, FUN = rk) # remaining to cubic spline basis
X
}

splS <- function(knots) {
q = length(knots) + 2
S = matrix(0, q, q)                         # initialize matrix
S[3:q, 3:q] = outer(knots, knots, FUN = rk) # fill in non-zero part
S
}

Matrix square root function. Note that there are various packages with their own.

mat_sqrt <- function(S) {
d  = eigen(S, symmetric = TRUE)
rS = d$vectors %*% diag(d$values^.5) %*% t(d\$vectors)
rS
}

Penalized fitting function.

prs_fit <- function(y, x, knots, lambda) {
q  = length(knots) + 2    # dimension of basis
n  = length(x)            # number of observations
Xa = rbind(splX(x, knots), mat_sqrt(splS(knots))*sqrt(lambda)) # augmented model matrix
y[(n + 1):(n+q)] = 0      # augment the data vector

lm(y ~ Xa - 1) # fit and return penalized regression spline
}

## Example 1

knots = 1:4/5
X = splX(x, knots)            # generate model matrix

fit_lm  = lm(wear ~ X - 1)    # fit model

xp = 0:100/100                # x values for prediction
Xp = splX(xp, knots)          # prediction matrix

Visualize.

ggplot(aes(x = x, y = wear), data = data.frame(x, wear)) +
geom_point(color = "#FF5500") +
geom_line(aes(x = xp, y = Xp %*% coef(fit_lm)),
data = data.frame(xp, Xp),
color = "#00AAFF") +
labs(x = 'Scaled Engine size', y  = 'Wear Index') ## Example 2

Now we add the lambda penalty and compare fits at different values of lambda.

knots = 1:7/8
d2 = data.frame(x = xp)
lambda = c(.1, .01, .001, .0001, .00001, .000001)
rmse   = vector('numeric', length(lambda))
idx = 0

for (i in lambda) {
# fit penalized regression
fit_penalized = prs_fit(
y = wear,
x = x,
knots = knots,
lambda = i
)
# spline choosing lambda
Xp = splX(xp, knots) # matrix to map parameters to fitted values at xp
LP = Xp %*% coef(fit_penalized)
d2[, paste0('lambda = ', i)] = LP[, 1]

r = resid(fit_penalized)
idx = 1 + idx

rmse[idx] = sqrt(mean(r^2))
}

Visualize. I add the root mean square error for model comparison.

d3 = d2 %>%
pivot_longer(cols = -x,
names_to  = 'lambda',
values_to = 'value') %>%
mutate(lambda = fct_inorder(lambda),
rmse   = round(rmse[lambda], 3)) ## Source

Original code available at https://github.com/m-clark/Miscellaneous-R-Code/blob/master/ModelFitting/cubicsplines.R