Cubic Spline Model

See Wood (2017) Generalized Additive Models or my document for an introduction to generalized additive models.

Data Setup

The data regards engine wear index versus engine capacity for 19 Volvo car engines used. The idea is that a larger car engine will wear out less quickly than a smaller one (from Wood GAM 2e chapter 4).

library(tidyverse)

data(engine, package = 'gamair')

size = engine$size
wear = engine$wear

x = size - min(size)
x = x / max(x)
d = data.frame(wear, x)

Functions

Cubic spline function, rk refers to reproducing kernel. If I recall correctly, the function code is actually based on the first edition of Wood’s text.

rk <- function(x, z) {
  ((z - 0.5)^2 - 1/12) * ((x - 0.5)^2 - 1/12)/4 -
    ((abs(x - z) - 0.5)^4 - (abs(x - z) - 0.5)^2 / 2 + 7/240) / 24
}

Generate the model matrix.

splX <- function(x, knots) {
  q = length(knots) + 2                # number of parameters
  n = length(x)                        # number of observations
  X = matrix(1, n, q)                  # initialized model matrix
  X[ ,2]   = x                         # set second column to x
  X[ ,3:q] = outer(x, knots, FUN = rk) # remaining to cubic spline basis
  X
}

splS <- function(knots) {
  q = length(knots) + 2
  S = matrix(0, q, q)                         # initialize matrix
  S[3:q, 3:q] = outer(knots, knots, FUN = rk) # fill in non-zero part
  S
}

Matrix square root function. Note that there are various packages with their own.

mat_sqrt <- function(S) {
  d  = eigen(S, symmetric = TRUE)
  rS = d$vectors %*% diag(d$values^.5) %*% t(d$vectors)
  rS
}

Penalized fitting function.

prs_fit <- function(y, x, knots, lambda) {
  q  = length(knots) + 2    # dimension of basis
  n  = length(x)            # number of observations
  Xa = rbind(splX(x, knots), mat_sqrt(splS(knots))*sqrt(lambda)) # augmented model matrix
  y[(n + 1):(n+q)] = 0      # augment the data vector
  
  lm(y ~ Xa - 1) # fit and return penalized regression spline
}

Example 1

We start with an unpenalized approach.

knots = 1:4/5
X = splX(x, knots)            # generate model matrix

fit_lm  = lm(wear ~ X - 1)    # fit model

xp = 0:100/100                # x values for prediction
Xp = splX(xp, knots)          # prediction matrix

Visualize.

ggplot(aes(x = x, y = wear), data = data.frame(x, wear)) +
  geom_point(color = "#FF5500") +
  geom_line(aes(x = xp, y = Xp %*% coef(fit_lm)),
            data = data.frame(xp, Xp),
            color = "#00AAFF") +
  labs(x = 'Scaled Engine size', y  = 'Wear Index')

Example 2

Now we add the lambda penalty and compare fits at different values of lambda.

knots = 1:7/8
d2 = data.frame(x = xp)
lambda = c(.1, .01, .001, .0001, .00001, .000001)
rmse   = vector('numeric', length(lambda))
idx = 0

for (i in lambda) {
  # fit penalized regression
  fit_penalized = prs_fit(
    y = wear,
    x = x,
    knots = knots,
    lambda = i
  ) 
  # spline choosing lambda
  Xp = splX(xp, knots) # matrix to map parameters to fitted values at xp
  LP = Xp %*% coef(fit_penalized)
  d2[, paste0('lambda = ', i)] = LP[, 1]
  
  r = resid(fit_penalized)
  idx = 1 + idx
  
  rmse[idx] = sqrt(mean(r^2))
}

Visualize. I add the root mean square error for model comparison.

d3 = d2 %>%
  pivot_longer(cols = -x,
               names_to  = 'lambda',
               values_to = 'value') %>% 
  mutate(lambda = fct_inorder(lambda),
         rmse   = round(rmse[lambda], 3))