R packages

The following is a non-exhaustive list of R packages which contain GAM functionality. Each is linked to the CRAN page for the package. Note also that several build upon the mgcv package used for this document.

brms Allows for Bayesian GAMs via the Stan modeling language (very new implementation).

CausalGAM This package implements various estimators for average treatment effects.

COZIGAM Constrained and Unconstrained Zero-Inflated Generalized Additive Models.

CoxBoost This package provides routines for fitting Cox models. See also cph in rms package for nonlinear approaches in the survival context.

gam Functions for fitting and working with generalized additive models.

GAMBoost: This package provides routines for fitting generalized linear and and generalized additive models by likelihood based boosting.

gamboostLSS: Boosting models for fitting generalized additive models for location, shape and scale (gamLSS models).

GAMens: This package implements the GAMbag, GAMrsm and GAMens ensemble classifiers for binary classification.

gamlss: Generalized additive models for location, shape, and scale.

gamm4: Fit generalized additive mixed models via a version of mgcv’s gamm function.

gammSlice: Bayesian fitting and inference for generalized additive mixed models.

GMMBoost: Likelihood-based Boosting for Generalized mixed models.

gss: A comprehensive package for structural multivariate function estimation using smoothing splines.

mboost: Model-Based Boosting.

mgcv: Routines for GAMs and other generalized ridge regression with multiple smoothing parameter selection by GCV, REML or UBRE/AIC. Also GAMMs.

VGAM: Vector generalized linear and additive models, and associated models.

A comparison to mixed models

We noted previously that there were ties between generalized additive and mixed models. Aside from the identical matrix representation noted in the technical section, one of the key ideas is that the penalty parameter for the smooth coefficients reflects the ratio of the residual variance to the variance components for the random effects (see Fahrmeier et al., 2013, p. 483). Conversely, we can recover the variance components by dividing the scale by the penalty parameter.

To demonstrate this, we can set things up by running what will amount to equivalent models in both mgcv and lme4 using the sleepstudy data set that comes from the latter.See ?sleepstudy for details. I’ll run a model with random intercepts and slopes, and for this comparison the two random effects will not be correlated. We will use the standard smoothing approach in mgcv, just with the basis specification for random effects - bs='re'. In addition, we’ll use restricted maximum likelihood as is the typical default in mixed models.

mixed_model = lmer(Reaction ~ Days + (1|Subject) + (0+Days|Subject), 
ga_model = gam(Reaction ~  Days + s(Subject, bs='re') + s(Days, Subject, bs='re'), 
               data=sleepstudy, method = 'REML')

We can see they agree on the fixed effects, but our output for the GAM is in the usual, albeit, uninterpretable form. So, we’ll have to translate the smooth terms from the GAM to variance components as in the mixed model.

Linear mixed model fit by REML ['lmerMod']
Formula: Reaction ~ Days + (1 | Subject) + (0 + Days | Subject)
   Data: sleepstudy

REML criterion at convergence: 1743.7

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9626 -0.4625  0.0204  0.4653  5.1860 

Random effects:
 Groups    Name        Variance Std.Dev.
 Subject   (Intercept) 627.57   25.051  
 Subject.1 Days         35.86    5.988  
 Residual              653.58   25.565  
Number of obs: 180, groups:  Subject, 18

Fixed effects:
            Estimate Std. Error t value
(Intercept)  251.405      6.885  36.513
Days          10.467      1.560   6.712

Correlation of Fixed Effects:
Days -0.184

Family: gaussian 
Link function: identity 

Reaction ~ Days + s(Subject, bs = "re") + s(Days, Subject, bs = "re")

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  251.405      6.885  36.513  < 2e-16 ***
Days          10.467      1.560   6.712 3.67e-10 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
                  edf Ref.df      F  p-value    
s(Subject)      12.94     17  89.29 4.56e-07 ***
s(Days,Subject) 14.41     17 104.56 1.82e-12 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.794   Deviance explained = 82.7%
-REML = 871.83  Scale est. = 653.58    n = 180

Conceptually, we can demonstrate the relationship with the following code that divides the scale by the penalty parameters, one for each of the smooth terms. However, there has been some rescaling behind the scenes regarding the Days effect, so we have to rescale it to get what we need.

rescaled_results = c(ga_model$reml.scale/ga_model$sp[1],
lmer_vcov = VarCorr(mixed_model) %>% data.frame()
gam_vcov = data.frame(var=rescaled_results, gam.vcomp(ga_model))
grp var1 vcov sdcor
Subject (Intercept) 627.57 25.05
Subject.1 Days 35.86 5.99
Residual NA 653.58 25.57
  var lower upper
s(Subject) 627.57 25.05 16.09 39.02
s(Days,Subject) 35.86 5.99 4.03 8.91
NA 25.57 22.79 28.68

Think about it this way. Essentially what is happening behind the scenes is that effect interactions with the grouping variable are added to the model matrix (e.g. ~ ... + Days:Subject - 1)38. The coefficients pertaining to the interaction terms are then penalized in the typical GAM estimation process. A smaller estimated penalty parameter suggests more variability in the random effects. A larger penalty means more shrinkage of the random intercepts and slopes toward the population level (fixed) effects.

Going further, we can think of smooth terms as adding random effects to the linear component39. A large enough penalty and the result is simply the linear part of the model. In this example here, that would be akin to relatively little random effect variance.

Time and Space

One of the things to know about GAMs is just how flexible they are. Along with all that we have mentioned, they can also be applied to situations where one is interested in temporal trends or the effects of spatial aspects of the data. The penalized regression approach used by GAMs can easily extend such situations, and the mgcv package in particular has a lot of options here.


A natural setting for GAMs is where there are observations over time. Perhaps we want to examine the trend over time. The SLiM would posit a linear trend, but we often would doubt that is the case. How would we do this with a GAM? We can incorporate a covariate representing the time component and add it as a smooth term. There will be some additional issues though as we will see.

Here I use the data and example at Gavin Simpon’s nifty blog, though with my own edits, updated data, and different model. Simpson (2018) has offered a recent article regarding this topic as well. The data regards global temperature anomalies, which for some people assume does not exist, but for our purposes is actually in front of our face.

## Global temperatures
## gtemp = read.table("", 

## Drop the even rows
gtemp = gtemp %>% drop_na()

Fitting a straight line to this would be disastrous, so let’s do a GAM.

hot_gam = gam(Annual ~ s(Year), data=gtemp)

Family: gaussian 
Link function: identity 

Annual ~ s(Year)

Parametric coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.090710   0.007653  -11.85   <2e-16 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
          edf Ref.df     F p-value    
s(Year) 7.899  8.683 158.4  <2e-16 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.891   Deviance explained = 89.6%
GCV = 0.010449  Scale est. = 0.0098984  n = 169

We can see that the trend is generally increasing, and has been more or less since the beginning of the 20th century. We have a remaining issue though. In general, a time series is autocorrelated, i.e. correlated with itself over time. We can see this in the following plot.


What the plot shows is the correlation of the values with themselves at different lags, or time spacings. Lag 0 is it’s correlation with itself, so the value is 1.0. It’s correlation with itself at the previous time point, i.e. lag = 1, is 0.91, it’s correlation with itself at two time points ago is slightly less, 0.85, and the decreasing trend continues slowly. The dotted lines indicate a 95% confidence interval around zero, meaning that the autocorrelation is still significant 25 years apart.

With our model, the issue remains in that there is still autocorrelation among the residuals, at least at lag 1.

The practical implications of autocorrelated residuals is that this positive correlation would result in variance estimates that are too low. However, we can take this into account with a slight tweaking of our model to incorporate such autocorrelation. For our purposes, we’ll switch to the gamm function. It adds additional functionality for generalized additive mixed models, though we can just use it to incorporate autocorrelation of the residuals. In running this, two sets of output are provided, one in our familiar gam model object, and the other as a lme object from the nlme package.

hot_gam_ar = gamm(Annual ~ s(Year), data=gtemp, correlation=corAR1(form = ~Year))
# summary(hot_gam)

Family: gaussian 
Link function: identity 

Annual ~ s(Year)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.09114    0.01154  -7.897 4.18e-13 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
         edf Ref.df     F p-value    
s(Year) 6.79   6.79 89.31  <2e-16 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =  0.889   
  Scale est. = 0.010424  n = 169
Linear mixed-effects model fit by maximum likelihood
 Data: strip.offset(mf) 
        AIC       BIC   logLik
  -282.6788 -267.0293 146.3394

Random effects:
 Formula: ~Xr - 1 | g
 Structure: pdIdnot
             Xr1      Xr2      Xr3      Xr4      Xr5      Xr6      Xr7      Xr8  Residual
StdDev: 1.227362 1.227362 1.227362 1.227362 1.227362 1.227362 1.227362 1.227362 0.1020979

Correlation Structure: AR(1)
 Formula: ~Year | g 
 Parameter estimate(s):
Fixed effects: y ~ X - 1 
                  Value Std.Error  DF   t-value p-value
X(Intercept) -0.0911422 0.0115764 167 -7.873104  0.0000
Xs(Year)Fx1   0.4181527 0.1699447 167  2.460522  0.0149
Xs(Year)Fx1 0     

Standardized Within-Group Residuals:
        Min          Q1         Med          Q3         Max 
-2.19174564 -0.70736150  0.03003736  0.71267301  3.23213330 

Number of Observations: 169
Number of Groups: 1 

In the gam output, we see some slight differences from the original model, but not much (and we wouldn’t expect it). From the lme output we can see the estimated autocorrelation value denoted as Phi40. Let’s see what it does for our fit.

We can in fact see that we were a bit optimistic in the previous fit (non-colored band). Our new fit is wider at every point41. Thus, in using a GAM for time-series data, we have the same issues we’d have with standard regression settings, and we can deal with them in much the same way to get a better sense of the uncertainty in our estimates.


Consider a data set with latitude and longitude coordinates along with other covariates used to model some target variable. A spatial regression analysis uses an approach to account for spatial covariance among the observation points. A common technique used is a special case of Gaussian process which, as we noted, certain types of GAMs can be seen as such also. In addition, some types of spatial models can be seen similar to random effects models, much like GAMs. Such connections mean that we can add spatial models to the sorts of models covered by GAMs too.

When dealing with space, we may have spatial locations of a continuous sort, such as with latitude and longitude, or in a discrete sense, such as regions. In what follows we’ll examine both cases.

Continuous Spatial Setting

This example is inspired by the post by Peter Ellis, which you can find here. Our example will use census data from New Zealand and focus on median income. It uses the nzcensus package42 which includes median income, latitude, longitude and several dozen other variables. The latitude and longitude are actually centroids of the area unit, so this technically could also be used as a discrete example based on the unit.

Let’s take an initial peek.

nz_census <- AreaUnits2013 %>% 
  filter(WGS84Longitude > 0 & ! %>% 
  rename(lon = WGS84Longitude,
         lat = WGS84Latitude,
         Income = MedianIncome2013) %>% 
map_data('nz') %>% 
  filter(grepl(region, pattern='North|South')) %>%
  group_by(group) %>% 
  plot_geo(x = ~nz_census$lon, y = ~nz_census$lat) %>% 
              # size=I(5),  # plotlys special bug
              text=~ labs,
              data=as_tibble(nz_census)) %>%
  config(displayModeBar=F) %>% 
  layout(title = '', 
         geo = g1) %>% 

So we can go ahead and run a model predicting median income solely by geography. We’ll use a Gaussian process basis, and allowing latitude and longitude to interact (bumping up the default wiggliness possible to allow for a little more nuance). What the GAM will allow us to do is smooth our predictions beyond the points we have in the data to get a more complete picture of income distribution across the whole area.

The m= argument allows one to specify different types of covariance functions.

nz_gam = gam(Income ~ s(lon, lat, bs='gp', k=100, m=2), data=nz_census)

Family: gaussian 
Link function: identity 

Income ~ s(lon, lat, bs = "gp", k = 100, m = 2)

Parametric coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  29497.8      148.1   199.2   <2e-16 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Approximate significance of smooth terms:
             edf Ref.df     F p-value    
s(lon,lat) 76.38   90.1 7.445  <2e-16 ***
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R-sq.(adj) =   0.27   Deviance explained = 30.1%
GCV = 4.0878e+07  Scale est. = 3.9105e+07  n = 1784
nz_map_data %>% 
  mutate(Size=.3) %>% 
  plot_geo(x=~lon, y=~lat, mode='markers', marker=list(size=~Size)) %>% 
  layout(title = 'Expected Income',
         geo = g1) %>%
  config(displayModeBar=FALSE) %>% 
              color =~Income, 
              # size=~Size,
              colors=scico::scico(100, palette = 'tokyo'), inherit = F
              )# %>%