Models Demystified

A Practical Guide from t-tests to Deep Learning

Author

Michael Clark & Seth Berry

Preface

Hello and welcome! This book is your companion to exploring the realm of modeling in data science. It is designed to provide you something useful whether you’re a beginner looking to learn some fundamentals, or an experienced practitioner seeking a fresh perspective. Our goal is to equip you with a better understanding of how models work and how to use them, including both basic and more advanced techniques, where we touch on everything from linear regression to deep learning. We’ll also show how different models relate to one another to better empower you to successfully apply them in your own data-driven projects. We aim to provide an overview on how to use both machine learning and traditional statistical modeling in a practical fashion, with a balanced emphasis on interpretability and predictive power. Join us on this exciting journey as we explore the world of models in data science!

This is still a work in progress, with more to come and plenty of things to clean up still. We hope to have the print version out on CRC press by the end of 2024. We welcome any feedback in the meantime as it develops, so please feel free to create an issue. For contributions, please see the contributing page for more information. Thanks for reading!

What Will You Get Out of This Book?

We’re hoping for a couple things for you as you read through this book. In particular, if you’re starting your journey into data science, we hope you’ll leave with:

  • A firm understanding of modeling basics from a practical perspective
  • A toolset of models and related ideas that you can instantly apply for competent modeling

If you’re already familiar with modeling, we hope you’ll leave with:

  • Additional context for the models you already know
  • Some introduction to models you don’t know
  • Additional understanding of how to choose the right model for the job and what to focus on

For anyone reading this book, we especially hope you get a sense of the commonalities between different models and a good sense of how they work.

Brief Prerequisites

You’ll definitely want to have some familiarity with R or Python (both are used for examples), and some very basic knowledge of statistics will be helpful. We’ll try to explain things as we go, but we won’t be able to cover everything. If you’re looking for a good introduction to R, we recommend R for Data Science or the Python for Data Analysis book for Python. Beyond that, we’ll try to provide the context you need so that you can be comfortable trying things out.

Also, if you happen to be reading this book in print, you can find the book in web form at https://m-clark.github.io/book-of-models. There you’ll find all the code, figures, and other content that you can interact with more easily, as well as the most up-to-date content, fixes, etc. The web version will be updated with some regularity and have additional content as well.

Data & Code

All the data and code used in this book is available on the book’s GitHub repository. See the data descriptions in the data section for more information on each of the datasets used. In addition, notebooks with chapter code are also available there (if applicable).

About the Authors

Michael is a senior machine learning scientist for Strong Analytics1. Prior to industry he honed his chops in academia, earning a PhD in Experimental Psychology before turning to data science full-time as a consultant. His models have been used in production across a variety of industries, and can be seen in dozens of publications across several academic disciplines. He has a passion for helping others learn difficult stuff, and has taught a variety of data science courses and workshops for people of all skill levels in many different contexts.

He also maintains a blog covering many aspects of statistical and machine learning modeling, and has several posts and long-form documents on a variety of data science topics there. He lives in Ann Arbor Michigan with his wife and his dog, where they all enjoy long walks around the neighborhood. During the course of writing this book, he became a father to Juni, and is now learning the joys of sleep deprivation.

Michael

Seth is the Academic Co-Director of the Master of Science in Business Analytics (MSBA) and Associate Teaching Professor at the University of Notre Dame for the IT, Analytics, and Operations Department. He likewise has a PhD in Applied Experimental Psychology and has been teaching and consulting in data science for over a decade. He is an excellent instructor, and teaches several data science courses at the undergraduate and graduate level.

Seth lives in the South Bend area of Indiana with his wife and three kids, and spends his free time lifting more weights than he should, playing guitar, and chopping wood.

Seth

  1. By the time you’re reading this, Strong’s merger with OneSix should be complete (2025).↩︎