Preface

This talk/workshop provides a brief introduction to generalized additive models (GAMs), or even more generally, or structured additive regression models (STARs). They offer a natural extension to common generalized linear models, while having the flexibility to incorporate a wide range of modeling situations.

Audience

Specifically, I have in mind applied researchers/students with a burgeoning interest in using models they probably weren’t exposed to in their typical statistical training. More generally, anyone that finds it useful.

Prerequisites

Prerequisites include a firm understanding of linear regression and exposure to generalized linear models. Some experience with using R for statistical modeling is required to follow along with the code and do the exercises. The workshop will potentially see a wide range of skill sets and modeling experience among attendees, but aims to be accessible to all. As such, the practical is favored over the technical, but some details are provided as an appendix, with additional references for those interested in going deeper down the rabbit hole.

Goal

To provide to those attending an overview of generalized additive models and their extensions. In addition, I hope to provide enough familiarity with what such models offer, via conceptual examples and exercises, that those attending will feel able to consider them in future modeling endeavors.

Outline

Why
  • demo where lm fails
  • traditional ways of dealing with problems that arise
Transition: standard polynomial
  • where it works
  • where it isn’t applicable/fails
Introducing GAMs
  • basic info
  • getting started
Note Issues
  • things to get used to
Extensions & Ties to other
  • Additional choices for smooth terms
  • Random effects
  • Spatial
  • GP
  • Boosting/RF
Technicals
  • by-hand example
  • penalized regression

Workshop Setup

The goal of the following is to create an RStudio project (i.e. a folder that represents the working directory moving forward) with a folder inside called ‘data’ that has the data for the demonstrations and exercises. For those not used to RStudio projects, read on, and use them for everything you do from now on.

RStudio

Click File/New Project/New Directory and give it a name, e.g. GAM_Workshop. BE MINDFUL ABOUT WHERE YOU ARE CREATING IT. You don’t have to do anything else with RStudio at this point, but leave it open.

Data

The data used in the document demonstrations and exercises is available at https://m-clark.github.io/workshops/stars/data.7z. It is a zipped archive containing several data files and .RData objects. You must:

  1. Download it. BE INTENTIONAL ABOUT WHERE YOU DOWNLOAD IT.
  2. Unzip it. Note that this is a different step than number 1, and failure to do this will render the stuff inside inaccessible. BE PURPOSEFUL ABOUT WHERE YOU UNZIP IT.
  3. Copy/Move it. This now usable folder needs to go into your project folder that you just created. This can actually be done as part of step 2.

When you are finished, you will now have an R project folder with a data folder inside it which contains the example data sets. To demonstrate this, in RStudio you should be able to open a script and run load('data/movies_yearly.RData').

Other

If you’re interested in the underlying code, visuals, markdown etc. these notes have a GitHub repository at https://github.com/m-clark/stars.