Exploratory Data Analysis

Exploring how to explore data.

exploratory data analysis
Author
Affiliation
Published

July 10, 2020

Keywords

R, exploratory data analysis, EDA, automated, arsenal, DataExplorer, SmartEDA, summarytools, dataMaid, janitor, visdat, descriptive statistics, summary, visualization

Introduction

In R there are many tools available to help you dive in and explore your data. However, in consulting I still see a lot of people using base R’s table and summary functions, followed by a lot of work to get the result into a more presentable format. My own frustrations led to me creating a package (tidyext) for personal use in this area. While that suits me fine, there are tools that can go much further with little effort. Recently, Staniak & Biecek wrote an article in the R Journal exploring several of such packages, so I thought I’d try them out for myself, and take others along with me for that ride.

As this will be a workshop/demo, I’ve created a separate repo and document to make it easier to find, so here is the link: https://m-clark.github.io/exploratory-data-analysis-tools/

The packages demoed are:

  • arsenal
  • DataExplorer
  • dataMaid
  • gtsummary
  • janitor (not explored in the previous article)
  • SmartEDA
  • summarytools
  • visdat

Tukey

Reuse

Citation

BibTeX citation:
@online{clark2020,
  author = {Clark, Michael},
  title = {Exploratory {Data} {Analysis}},
  date = {2020-07-10},
  url = {https://m-clark.github.io/posts/2020-07-10-eda/},
  langid = {en}
}
For attribution, please cite this work as:
Clark, Michael. 2020. “Exploratory Data Analysis.” July 10, 2020. https://m-clark.github.io/posts/2020-07-10-eda/.