Models Demystified

A Practical Guide from t-tests to Deep Learning

Author

Michael Clark & Seth Berry

Preface

Hi there and welcome to our guide to using models in data science! This resource is designed to provide you something useful whether you’re a beginner looking to learn some fundamentals, or an experienced practitioner seeking a fresh perspective. Our goal is to equip you with a better understanding of how models work and how to use them, including both basic and more advanced techniques, where we touch on everything from linear regression to deep learning. We’ll also show how different models relate to one another to better empower you to successfully apply them in your own data-driven projects. This book focuses on the core things you need to know to use models effectively, with a focus on application while also presenting a deeper dive for those that prefer it, and we aim to provide an overview on how to use both machine learning and traditional statistical modeling in a practical fashion. Join us on this exciting journey as we explore the world of models!

This is very much a work in progress, with more to come and plenty of things to clean up still. This should be out on CRC press in 2024. We welcome any feedback in the meantime as it develops, so please feel free to create an issue. For contributions, please see the contributing page for more information if available, other. Thanks for reading!

What Will You Get Out of This Book?

We’re hoping for a couple things for you as you read through this book.

If you’re starting your journey into data science, we hope you’ll leave with:

A firm understanding modeling basics from a practical perspective
A tool set that you can instantly apply for competent modeling

If you’re already familiar with modeling, we hope you’ll leave with:

A deeper understanding of the models you already know
A broader understanding of the models you don’t know
A better understanding of how to choose the right model for the job

Brief Prerequisites

You’ll definitely want to have some familiarity with R or Python, and some very basic knowledge of statistics will be helpful. We’ll try to explain things as we go, but we won’t be able to cover everything. If you’re looking for a good introduction to R, we recommend R for Data Science or the Python for Data Analysis book for Python. Beyond that, we’ll try to provide the context you need, be comfortable trying things out.

About the Authors

Michael is a senior machine learning scientist for Strong Analytics. Prior to industry he honed his chops in academia, earning a PhD in Experimental Psychology before turning to data science full-time as a consultant. His models have been used in production across a variety of industries, and can be seen in dozens of publications across several disciplines. He has a passion for helping others learn difficult stuff, and has taught a variety of data science courses and workshops for people of all skill levels in many different contexts.

He also maintains a blog, and has several posts and long-form documents on a variety of data science topics there. He lives in Ann Arbor Michigan with his wife and his dog, where they all enjoy long walks around the neighborhood.

Seth is the Academic Co-Director of the Master of Science in Business Analytics (MSBA) and Associate Teaching Professor at the University of Notre Dame for the IT, Analytics, and Operations Department. He likewise has a PhD in Applied Experimental Psychology and has been teaching and consulting in data science for over a decade. He is an excellent instructor, and teaches several data science courses at the undergraduate and graduate level. He lives in South Bend, Indiana with his wife and three kids, and spends his free time lifting more weights than he should, playing guitar, and chopping wood.