There are many options available in mgcv beyond cubic splines, including thin plate, duchon, p-splines and more, as well as ones for cyclical effects, geographical, and beyond. See
?smooth.terms for more information.
Not only can we have multiple smooth terms in a model, we can implement interactions of smooth terms. The following shows a model in which we could combine the effects of year and length of the movie via the te function, along with the corresponding predictive plots. I use the raw movies data set, with some filters to keep the observations retained more comparable. Note that the 3d is interactive, so twist it however you like to see what’s going on.
gam_model_2d = gam(rating ~ te(year, length), data=movies_filtered)
We now get the sense that the highest expected ratings might be expected with more longer films, perhaps even more for more recent ones. Starting around the 60s, shorter films are associated with worse ratings.
There is also a way to do this interaction in a manner similar to ANOVA decomposition with both main effects and interaction. See the ti function.
We can apply a smooth for a continuous covariate across levels of some grouping factor, which extends our interactions to categorical variables. In the following, we’ll let year have a different effect for Action movies vs. other movies.
gam_model_by = gam(rating ~ s(year, by=Action) + Action + budget_2016 + votes + length, data=movies_yearly_action)
In this case Action movies are generally rated worse, and suffered a second and sharper dip starting in the 80s10. Also, their recent rise is not quite as sharp. Another alternative to this type of grouped approach, which would be applicable if there are many categories, specifies a specific type of smooth that would be similar to random slopes in a mixed model (see
?factor.smooth.interaction). We now turn to the mixed model approach with GAMs.
Consider the matrix \(Z\) that includes the basis functions, with associated coefficients \(\gamma\), we could then write our model in matrix form as follows:
\[ y = X\beta + Z\gamma + \epsilon\]
If you’ve spent much time with mixed models, that probably looks familiar to you. It not only looks like the mixed model, we can exploit this to produce the same results. The mgcv package in fact has many tools specifically for mixed models.
For this example I will use the sleep study data that comes with the lme4 package. It has reaction times for some tests where 18 subjects were restricted to 3 hours of sleep each night for 10 days. We fit a model with (uncorrelated) random intercepts and random slopes for Days.
library(lme4) mod_lme4 = lmer(Reaction ~ Days + (1|Subject) + (0 + Days|Subject), sleepstudy) mod_gamm = gam(Reaction ~ Days + s(Subject, bs='re') + s(Days, Subject, bs='re'), data=sleepstudy, method='REML') VarCorr(mod_lme4)
Groups Name Std.Dev. Subject (Intercept) 25.0513 Subject.1 Days 5.9882 Residual 25.5653
Standard deviations and 0.95 confidence intervals: std.dev lower upper s(Subject) 25.051370 16.085365 39.015038 s(Days,Subject) 5.988153 4.025248 8.908263 scale 25.565254 22.791745 28.676269 Rank: 3/3
The results are essentially identical. The mgcv package can also use lme4 and nlme under the hood via other functions (gamm4 package and gamm function respectively). Extending GAMs into the mixed model world opens the door to some very powerful modeling possibilities.
Consider a data set with latitude and longitude coordinates among other covariates used to model some target variable. A spatial regression analysis uses an approach to account for spatial covariance among the observation points (e.g. kriging). Such an approach is a special case of Gaussian process which, as we note later, GAMs are as well. As such we can add spatial models to the sorts of models covered by GAMs too. The following gives a sense of the syntax. The mgcv package even has a
gam(y ~ s(lat, lon, bs='gp')) # Matern by default
What about the discrete case, where the spatial random effect is based on geographical regions? This involves a penalty that is based on the adjacency matrix of the regions, where if there are \(g\) regions, the adjacency matrix is a \(g \times g\) indicator matrix where there is some non-zero value when region i is connected to region j, and 0 otherwise. In addition an approach similar to that for a random effect is used to incorporate observations belonging to specific regions. These are sometimes referred to as geoadditive models.
You’ll be shocked to know that mgcv has a smooth construct for this situation as well,
mrf stands for Markov random field, which is an undirected graph. Plotting the smooth term is akin to displaying the spatial random effect.
The following shows example code. See the help for
mrf for more detail.
gam(crime ~ s(region, bs = "mrf", xt = polygonListObject), data = df, method = "REML")
My God! It’s full of STARs!
The incorporation of additive, mixed, and spatial regression models all within one modeling framework gets us to STARs, or, structured additive regression models. This is an extremely general and powerful modeling tool that can take you very, very far11.
Connections & Generalizations
Adaptive Basis Function Models
GAMs can be seen as belonging to a family of what are called adaptive basis function models. These include random forests, neural nets, and boosting approaches. GAMs might be seen as a midpoint lying between highly interpretable standard linear models and those more black box methods.
Beyond GLM family distributions
The mgcv package provides a plethora of distributions to use in modeling that might useful including:
- negative binomial
- ordered categorical
- zero-inflated Poisson (ZIP)
- cox proportional hazards
- gaulss Gaussian model for both mean (location) and standard deviation (scale)
- gevlss same for generalized extreme value
- ziplss for a hurdle approach to zero-inflation
- multivariate normal
- multinomial logistic
Extending Random Effects
In addition to distribution extensions that could also be incorporated with random effects, providing modeling capabilities not generally offered in mixed model packages, mgcv also allows for correlated residuals structure as with nlme, but for the generalized case. This is especially of interest in the longitudinal setting, where one expects residuals for observations closer in time to be more correlated. In addition, the distributions just noted can also be used for in the random effects setting, e.g. a mixed model with beta distributed response.
Where the Gaussian distribution is over vectors and defined by a mean vector and covariance matrix, a Gaussian Process is a distribution over functions. A function \(f\) is distributed as a Gaussian Process defined by a mean function \(m\) and covariance function \(k\)12.
It turns out that GAMs with a thin plate, tensor product, or cubic spline smooth are maximum a posteriori (MAP) estimates of Gaussian processes with specific covariance functions and a zero mean function. In that sense one might segue nicely to Gaussian processes if familiar with additive models. The mgcv package, … wait for it…, also allows one to use a spline form of Gaussian process
Bayesian GAM are also possible within R through a variety of packages. The brms package can take your GAM to the Bayesian setting using the same syntax even, and is particularly useful for adding the mixed model component as well, and alternative distributions. It is based on the Stan programming language.
This suggest that the IMDB ratings might be a bit problematic. 1979-1989 was practically the golden age for Action movies, with things like Rambo, Alien, Terminator, Mad Max, Die Hard, Indiana Jones, RoboCop, Bloodsport, the Killer, The Untouchables, Police Story, Escape from New York, Predator, Lethal Weapon etc. Surely there couldn’t have been so many bad movies as to outweigh those? Oh, wait…↩
In case you aren’t familiar with the phrase ‘it’s full of stars’, it comes from the book 2001: A Space Odyssey: see link. The film adaptation is considered one of the greatest films of all time, because it is. The quote is not in that movie, but in the film sequel 2010.↩
They have a close tie to reproducing kernel hilbert space methods, and generalize commonly used models in spatial modeling (kriging). In addition, a GP can be interpreted as a standard multilayer perceptron neural net with a single hidden layer consisting of an infinite number of nodes.↩