Intro

This document is the basis for multiple workshops, whose common goal is to provide some tools, tips, packages etc. that make data processing, programming, modeling, visualization, and presentation in R easier. It is oriented toward those who have had some exposure to R in an applied data analysis fashion, but would also be useful to someone coming to R from another programming language. It is not an introduction to R. The goal here is primarily to instill awareness, specifically of tools that will make your data exploration, modeling, and visualization easier, and to understand some of the why behind the tools, so that one can better implement them. It is meant to fill in some of the gaps that typically befall applied users of R.

Outline

Part 1: Information Processing

Understanding Basic R Approaches to Gathering and Processing Data

  • Overview of Data Structures
  • Input/Output
  • Indexing

Getting Acquainted with Other Approaches to Data Processing

  • Pipes, and how to use them
  • tidyverse
  • data.table
  • Misc.

Part 2: Programming Basics

Using R more fully

  • Dealing with objects
  • Iterative programming
  • Writing functions

Going further

  • Code style
  • Vectorization
  • Regular expressions

Part 3: Modeling

Model Exploration

  • Key concepts
  • Understanding and fitting models
  • Overview of extensions

Model Criticism

  • Model Assessment
  • Model Comparison

Machine Learning

  • Concepts
  • Demonstration of techniques

Part 4: Visualization

Thinking Visually

  • Visualizing Information
  • Color
  • Contrast
  • and more…

ggplot2

  • Aesthetics
  • Layers
  • Themes
  • and more…

Adding Interactivity

  • Package demos
  • Shiny

Part 5: Presentation

Possible future addition. See this for now.

Preparation

To follow along with the examples, clone/download the related section repos. Downloading any one of them will have an R project and associated data, such that the code from any section should run.

(in progress as document is being revamped and extended)

Other

Color coding in text:

  • emphasis
  • package
  • function
  • object/class
  • link

Some key packages used in the following demonstrations and exercises:

tidyverse (several packages), data.table, tidymodels

Python

The related Python notebooks may be found here: here.

R

Many other packages are also used for data or minor demonstration, so feel free to install as we come across them. Here are a few.

ggplot2movies, nycflights13, DT, highcharter, magrittr, maps, mgcv (already comes with base R), plotly, quantmod, readr, visNetwork, emmeans, ggeffects