• Practical Data Science
  • Introduction
    • Intended Audience
    • Programming Language
    • Additional Practice
    • Outline
      • Part 1: Information Processing
      • Part 2: Programming Basics
      • Part 3: Modeling
      • Part 4: Visualization
      • Part 5: Presentation
    • Workshops
    • Other
      • Python notebooks
      • Other R packages
      • History
      • Current Efforts
  • Part I: Information Processing
  • Data Structures
    • Vectors
      • Character strings
      • Factors
      • Logicals
      • Numeric and integer
      • Dates
    • Matrices
      • Creating a matrix
    • Lists
    • Data Frames
      • Creating a data frame
    • Data Structure Exercises
      • Exercise 1
      • Exercise 2
      • Thinking Exercises
    • Python Data Structures Notebook
  • Input/Output
    • Better & Faster Approaches
    • R-specific Data
      • R Datasets
    • Other Types of Data
    • On the Horizon
    • Big Data
    • I/O Exercises
      • Exercise 1
      • Thinking Exercises
    • Python I/O Notebook
  • Indexing
    • Slicing Vectors
    • Slicing Matrices/data.frames
    • Label-based Indexing
    • Position-based Indexing
    • Mixed Indexing
    • Non-contiguous
    • Boolean
    • List/data.frame Extraction
    • Indexing Exercises
      • Exercise 1
      • Exercise 2
      • Exercise 3
    • Python Indexing Notebook
  • Pipes
    • Using Variables as They are Created
    • Pipes for Visualization (more later)
    • The Dot
    • Flexibility
    • Pipes Summary
  • Tidyverse
    • What is the Tidyverse?
    • What is Tidy?
    • dplyr
      • An example
    • Running Example
    • Selecting Columns
      • Helper functions
    • Filtering Rows
    • Generating New Data
    • Grouping and Summarizing Data
    • Renaming Columns
    • Merging Data
    • Pivoting axes
    • More Tidyverse
    • Personal Opinion
    • Tidyverse Exercises
      • Exercise 0
      • Exercise 1
      • Exercise 2
      • Exercise 3
      • Exercise 4
    • Python Pandas Notebook
  • data.table
    • data.table Basics
    • Grouped Operations
    • Faster!
      • Joins
      • Group by
      • String matching
      • Reading files
      • More speed
    • Pipe with data.table
    • data.table Summary
    • Faster dplyr Alternatives
    • data.table Exercises
      • Exercise 0
      • Exercise 1
      • Exercise 2
  • Part II: Programming
  • Programming Basics
    • R Objects
      • Object Inspection & Exploration
      • Methods
      • S4 classes
      • Others
      • Inspecting Functions
    • Documentation
    • Objects Exercises
  • Iterative Programming
    • For Loops
      • A slight speed gain
      • While alternative
      • Loops summary
    • Implicit Loops
      • apply family
      • Apply functions
      • purrr
    • Looping with Lists
    • Iterative Programming Exercises
      • Exercise 1
      • Exercise 2
      • Exercise 3
  • Writing Functions
    • A Starting Point
    • DRY
    • Conditionals
    • Anonymous functions
    • Writing Functions Exercises
      • Excercise 1
      • Excercise 1b
      • Exercise 2
  • More Programming
    • Code Style
      • Why does your code exist?
      • Assignment
      • Code length
      • Spacing
      • Naming things
      • Other
    • Vectorization
      • Boolean indexing
      • Vectorized operations
    • Regular Expressions
      • Typical uses
    • Code Style Exercises
      • Exercise 1
      • Exercise 2
    • Vectorization Exercises
      • Exercise 1
      • Exercise 2
    • Regex Exercises
      • Exercise 1
  • Part III: Modeling
  • Model Exploration
    • Model Taxonomy
    • Linear models
    • Estimation
      • Minimizing and maximizing
      • Optimization
    • Fitting Models
      • Using matrices
    • Summarizing Models
    • Variable Transformations
      • Numeric variables
      • Categorical variables
      • Scales, indices, and dimension reduction
      • Don’t discretize
    • Variable Importance
    • Extracting Output
      • Package support
    • Visualization
    • Extensions to the Standard Linear Model
      • Different types of targets
      • Correlated data
      • Other extensions
    • Model Exploration Summary
    • Model Exploration Exercises
      • Exercise 1
      • Exercise 2
      • Exercise 3
    • Python Model Exploration Notebook
  • Model Criticism
    • Model Fit
      • Standard linear model
      • Beyond OLS
      • Classification
    • Model Assumptions
    • Predictive Performance
    • Model Comparison
      • Example: Additional covariates
      • Example: Interactions
      • Example: Additive models
    • Model Averaging
    • Model Criticism Summary
    • Model Criticism Exercises
      • Exercise 0
      • Exercise 1
      • Exercise 2
    • Python Model Criticism Notebook
  • Machine Learning
    • Concepts
      • Loss
      • Bias-variance tradeoff
      • Regularization
      • Cross-validation
      • Optimization
      • Tuning parameters
    • Techniques
      • Regularized regression
      • Random forests
      • Neural networks
    • Interpreting the Black Box
    • Machine Learning Summary
    • Machine Learning Exercises
      • Exercise 1
      • Exercise 2
    • Python Machine Learning Notebook
  • Part IV: Visualization
  • ggplot2
    • Layers
    • Piping
    • Aesthetics
    • Geoms
    • Examples
    • Stats
    • Scales
    • Facets
    • Multiple plots
    • Fine control
    • Themes
    • Extensions
    • ggplot2 Summary
    • ggplot2 Exercises
      • Exercise 0
      • Exercise 1
      • Exercise 2
    • Python Plotnine Notebook
  • Interactive Visualization
    • Packages
    • Piping for Visualization
    • htmlwidgets
    • Plotly
      • Modes
      • ggplotly
    • Highcharter
    • Graph networks
      • visNetwork
      • sigmajs
      • Plotly
    • leaflet
    • DT
    • Shiny
      • Dash
    • Interactive and Visual Data Exploration
    • Interactive Visualization Exercises
      • Exercise 0
      • Exercise 1
      • Exercise 2
      • Exercise 3
    • Python Interactive Visualization Notebook
  • Thinking Visually
    • Information
      • Your audience isn’t dumb
      • Clarity is key
      • Avoid clutter
      • Color isn’t optional
      • Think interactively
    • Color
      • Viridis
      • Scientific colors
      • RColorBrewer
    • Contrast
    • Scaling Size
    • Transparency
    • Accessibility
    • File Types
    • Summary of Thinking Visually
    • A casual list of things to avoid
      • Pie
      • Histograms
      • Using 3D without adding any communicative value
      • Using too many colors
      • Using valenced colors when data isn’t applicable
      • Showing maps that just display population
      • Biplots
    • Thinking Visually Exercises
      • Exercise 1
      • Exercise 2
      • Thinking exercises
  • Part V: Presentation
  • Building Better Data-Driven Products
    • Rep* Analysis
      • Example
      • Repeatable
      • Reproducible
      • Replicable
      • Summary of rep* analysis
    • Literate Programming
    • R Markdown
    • Version Control
    • Dynamic Data Analysis & Report Generation
    • Using Modern Tools
  • Getting Started
    • What is Markdown?
    • Documents
      • Standard HTML
      • R notebooks
      • Distill
      • Bookdown
    • Presentations
    • Apps, Sites & Dashboards
    • Templates
    • How to Begin
  • Standard Documents
    • R Markdown files
    • Text
    • Code
      • Chunks
      • In-line
      • Labels
      • Running code
    • Multiple Documents
      • Knitting multiple documents into one
      • Parameterized reports
    • Collaboration
    • Using Python for Documents
  • Customization & Configuration
    • Output Options
      • Themes etc.
    • YAML
    • HTML & CSS
      • HTML
      • CSS
      • Custom classes
    • Personal Templates
    • The Rabbit Hole Goes Deep
    • R Markdown Exercises
      • Exercise 1
      • Exercise 2
      • Exercise 3
      • Exercise 4
      • Exercise 5
  • Wrap-up
  • Summary
  • Appendix
    • R Markdown
      • Footnotes
      • Citations and references
      • Multiple documents
      • Web standards
  • References
  • MC logo

Practical Data Science

Practical Data Science

Doing more with your data

Michael Clark
https://m-clark.github.io/

2020-10-12