Intro

This document is the basis for multiple workshops, whose common goal is to provide some tools, tips, packages etc. that make data processing, programming, modeling, visualization, and presentation in R easier. It is oriented toward those who have had some exposure to R in an applied data analysis fashion, but would also be useful to someone coming to R from another programming language. It is not an introduction to R. The goal here is primarily to instill awareness, specifically of tools that will make your data exploration, modeling, and visualization easier, and to understand some of the why behind the tools, so that one can better implement them. It is meant to fill in some of the gaps that typically befall applied users of R.

Outline

Part 1: Data Processing

Understanding Base R Approaches to Data Processing

  • Overview of Data Structures
  • Input/Output

Getting Acquainted with Other Approaches to Data Processing

  • Pipes, and how to use them
  • tidyverse
  • data.table
  • Misc.

Part 2: Programming Basics

Using R more fully

  • Dealing with objects
  • Iterative programming
  • Writing functions

Going further

  • Vectorization
  • Regular expressions

Part 3: Modeling

Model Exploration

Model Criticism

Machine Learning

Part 4: Visualization

Thinking Visually

  • Visualizing Information
  • Color
  • Contrast
  • and more…

ggplot2

  • Aesthetics
  • Layers
  • Themes
  • and more…

Adding Interactivity

  • Package demos
  • Shiny

Part 5: Presentation

Possible future addition. See this for now.

Preparation

To follow along with the examples, clone/download the related section repos. Downloading any one of them will have an R project and associated data, such that the code from any section should run.

(in progress as document is being revamped and extended)

Other

Color coding in text:

  • emphasis
  • package
  • function
  • object/class
  • link

Some key packages used in the following demonstrations and exercises:

tidyverse (several packages), data.table, ggplot2movies

Python

Python notebooks for the data processing section and visualization sections may be found here.

R

Many other packages are also used, so feel free to install as we come across them. Here are a few.

nycflights13, DT, highcharter, magrittr, maps, mgcv (already comes with base R), plotly, quantmod, readr, visNetwork