Models Demystified

A Practical Guide from Linear Regression to Deep Learning

Author

Michael Clark & Seth Berry

Preface

Hello and welcome! This book is your companion to exploring the realm of modeling in data science. It is designed to provide you with something useful whether you’re a beginner looking to learn some fundamentals, or an experienced practitioner seeking a fresh perspective. Our goal is to equip you with a better understanding of how models work and how to use them, including both basic and more advanced techniques, where we touch on everything from linear regression to deep learning. We’ll also show how different models relate to one another to better empower you to successfully apply them in your own data-driven projects. We aim to provide an overview on how to use both machine learning and traditional statistical modeling in a practical fashion, with a balanced emphasis on interpretability and predictive power. Join us on this exciting journey as we explore the world of models in data science!

What Will You Get Out of This Book?

We’re hoping for a couple things for you as you read through this book. In particular, if you’re starting your journey into data science, we hope you’ll leave with:

A firm understanding of modeling basics from a practical perspective
A toolset of models and related ideas that you can instantly apply for competent modeling
A balanced treatment of statistical and machine learning approaches

If you’re already familiar with modeling, we hope you’ll leave with:

Additional context for the models you already know
Some introduction to models you don’t know
Additional understanding of how to choose the right model for the job and what to focus on

For anyone reading this book, we especially hope you get a sense of the commonalities between different models and a good sense of how they work. If you happen to be reading this book in print, you can find the book in web form at https://m-clark.github.io/book-of-models. There you’ll also find all the code, figures, and other content that you can interact with more easily, as well as the most up-to-date content, fixes, etc. The web version will be updated with some regularity and have additional content as well.

Brief Prerequisites

You’ll definitely want to have some familiarity with R or Python (both are used for examples), and some very basic knowledge of statistics will be helpful. We’ll try to explain things as we go, but we won’t be able to cover everything. If you’re looking for a good introduction to R, we recommend R for Data Science or the Python for Data Analysis book for Python. Beyond that, we’ll try to provide the context you need so that you can be comfortable trying things out.

Data and Code

All the data and code used in this book is available on the book’s GitHub repository. See the data descriptions in the data section for more information on each of the datasets used. In addition, notebooks with chapter code are also available there (if applicable). For contributions, please see the contributing page for more information. Thanks for reading!

About the Authors

Michael Clark is a Senior Machine Learning Scientist for OneSix. Prior to industry he honed his chops in academia, earning a PhD in Experimental Psychology before turning to data science full-time as a consultant. His models have been used in production across a variety of industries, and can be seen in dozens of publications across several academic disciplines. He has a passion for helping others learn difficult stuff, and he has taught a variety of data science courses and workshops for people of all skill levels in many different contexts.

He also maintains a blog covering many aspects of statistical and machine learning modeling, and has several posts and long-form documents on a variety of data science topics there. He lives in Ann Arbor, Michigan with his wife and his dog, where they all enjoy long walks around the neighborhood. During the course of writing this book, he became a father to Juni, and he is now learning the joys of sleep deprivation.

Seth Berry is the Academic Co-Director of the Master of Science in Business Analytics (MSBA) and Associate Teaching Professor at the University of Notre Dame for the IT, Analytics, and Operations Department. He likewise has a PhD in Applied Experimental Psychology and has been teaching and consulting in data science for over a decade. He is an excellent instructor of several data science courses at the undergraduate and graduate level.

He lives in the South Bend area of Indiana with his wife and three kids, and he spends his free time lifting more weights than he should, playing guitar, and chopping wood.