Appendix D — References

These references tend to be more functional than academic, and hopefully will be more practically useful to you as well. If you prefer additional academic resources, you’ll find some of those as well, but you can also look at the references within many of these for deeper or more formal dives, or just search Google Scholar for any of the topics covered.

3Blue1Brown. 2024. “How Large Language Models Work, a Visual Intro to Transformers Chapter 5, Deep Learning.” https://www.youtube.com/watch?v=wjZofJX0v4M.

Albon, Chris. 2024. “Machine Learning Notes.” https://chrisalbon.com/Home.

Amazon. 2024. “What Is Data Augmentation? - Data Augmentation Techniques Explained - AWS.” Amazon Web Services, Inc. https://aws.amazon.com/what-is/data-augmentation/.

Angelopoulos, Anastasios N., and Stephen Bates. 2022. “A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.” arXiv. https://doi.org/10.48550/arXiv.2107.07511.

Barrett, Malcolm, Lucy D’Agostino McGowan, and Travis Gerke. 2024. Causal Inference in R. https://www.r-causal.org/.

Bischl, Bernd, Raphael Sonabend, Lars Kotthoff, and Michel Lang, eds. 2024. Applied Machine Learning Using Mlr3 in R. https://mlr3book.mlr-org.com/.

Boykis, Vicki. 2023. “What Are Embeddings?” http://vickiboykis.com/what_are_embeddings/index.html.

Brownlee, Jason. 2019. “A Gentle Introduction to Imbalanced Classification.” MachineLearningMastery.com. https://machinelearningmastery.com/what-is-imbalanced-classification/.

Brownlee, Jason. 2021. “Gradient Descent With AdaGrad From Scratch.” MachineLearningMastery.com. https://machinelearningmastery.com/gradient-descent-with-adagrad-from-scratch/.

Bürkner, Paul-Christian, and Matti Vuorre. 2019. “Ordinal Regression Models in Psychology: A Tutorial.” Advances in Methods and Practices in Psychological Science 2 (1): 77–101. https://doi.org/10.1177/2515245918823199.

Cawley, Gavin C., and Nicola L. C. Talbot. 2010. “On Over-Fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation.” The Journal of Machine Learning Research 11 (August):2079–2107.

Chernozhukov, Victor, Christian Hansen, Nathan Kallus, Martin Spindler, and Vasilis Syrgkanis. 2024. “Applied Causal Inference Powered by ML and AI.” arXiv. http://arxiv.org/abs/2403.02467.

Clark, Michael. 2018a. Graphical & Latent Variable Modeling. https://m-clark.github.io/sem/.

Clark, Michael. 2018b. “Thinking about Latent Variables.” https://m-clark.github.io/docs/FA_notes.html.

Clark, Michael. 2021a. Model Estimation by Example. https://m-clark.github.io/models-by-example/.

Clark, Michael. 2021b. “This Is Definitely Not All You Need,” July. https://m-clark.github.io/posts/2021-07-15-dl-for-tabular/.

Clark, Michael. 2022a. “Deep Learning for Tabular Data,” May. https://m-clark.github.io/posts/2022-04-01-more-dl-for-tabular/.

Clark, Michael. 2022b. Generalized Additive Models. https://m-clark.github.io/generalized-additive-models/.

Clark, Michael. 2023. Mixed Models with R. https://m-clark.github.io/mixed-models-with-R/.

Clark, Michael. 2025. “Imbalanced Outcomes.” https://m-clark.github.io/posts/2025-04-07-class-imbalance/.

Cohen, Jacob. 2009. Statistical Power Analysis for the Behavioral Sciences. 2. ed., reprint. New York, NY: Psychology Press.

Cunningham, Scott. 2023. Causal Inference The Mixtape. https://mixtape.scunning.com/.

DataBricks. 2019. “What Is AdaGrad?” Databricks. https://www.databricks.com/glossary/adagrad.

Davison, A. C., and D. V. Hinkley. 1997. Bootstrap Methods and Their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511802843.

Dobson, Annette J., and Adrian G. Barnett. 2018. An Introduction to Generalized Linear Models. 4th ed. New York: Chapman & Hall/CRC Press. https://doi.org/10.1201/9781315182780.

Dunn, Peter K., and Gordon K. Smyth. 2018. Generalized Linear Models With Examples in R. Springer.

Efron, Bradley, and R. J. Tibshirani. 1994. An Introduction to the Bootstrap. New York: Chapman & Hall/CRC Press. https://doi.org/10.1201/9780429246593.

Elor, Yotam, and Hadar Averbuch-Elor. 2022. “To SMOTE, or Not to SMOTE?” arXiv. https://doi.org/10.48550/arXiv.2201.08528.

Facure Alves, Matheus. 2022. “Causal Inference for The Brave and True — Causal Inference for the Brave and True.” https://matheusfacure.github.io/python-causality-handbook/landing-page.html.

Fahrmeir, Ludwig, Thomas Kneib, Stefan Lang, and Brian D. Marx. 2021. Regression: Models, Methods and Applications. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-662-63882-8.

Faraway, Julian. 2014. Linear Models with R. Routledge & CRC Press. https://www.routledge.com/Linear-Models-with-R/Faraway/p/book/9781439887332.

Faraway, Julian J. 2016. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, Second Edition. 2nd ed. New York: Chapman & Hall/CRC Press. https://doi.org/10.1201/9781315382722.

Fox, John. 2015. Applied Regression Analysis and Generalized Linear Models. SAGE Publications.

Gelman, Andrew. 2013. “What Are the Key Assumptions of Linear Regression? Statistical Modeling, Causal Inference, and Social Science.” https://statmodeling.stat.columbia.edu/2013/08/04/19470/.

Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis, Third Edition. CRC Press.

Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press.

Gelman, Andrew, Jennifer Hill, Ben Goodrich, Jonah Gabry, Daniel Simpson, and Aki Vehtari. 2025. “Advanced Regression and Multilevel Models.” http://www.stat.columbia.edu/~gelman/armm/.

Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. 1st ed. Cambridge University Press. https://doi.org/10.1017/9781139161879.

Google. 2023. “Machine Learning Google for Developers.” https://developers.google.com/machine-learning.

Google. 2024. “MLOps: Continuous Delivery and Automation Pipelines in Machine Learning Cloud Architecture Center.” Google Cloud. https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning.

Gorishniy, Yury, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, and Artem Babenko. 2023. “TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023.” arXiv. https://doi.org/10.48550/arXiv.2307.14338.

Greene, William. 2017. Econometric Analysis - 8th Edition. https://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm.

Gruber, Cornelia, Patrick Oliver Schenk, Malte Schierholz, Frauke Kreuter, and Göran Kauermann. 2023. “Sources of Uncertainty in Machine Learning – A Statisticians’ View.” arXiv. https://doi.org/10.48550/arXiv.2305.16703.

Hardin, James W., and Joseph M. Hilbe. 2018. Generalized Linear Models and Extensions. Stata Press.

Harrell, Frank E. 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. 2nd ed. Springer Series in Statistics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-19425-7.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2017. Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Edition. https://hastie.su.domains/ElemStatLearn/.

Hernán, Miguel A. 2018. “The C-Word: Scientific Euphemisms Do Not Improve Causal Inference From Observational Data.” American Journal of Public Health 108 (5): 616–19. https://doi.org/10.2105/AJPH.2018.304337.

Howard, Jeremy. 2024. “Practical Deep Learning for Coders - Practical Deep Learning.” Practical Deep Learning for Coders. https://course.fast.ai/.

Hyndman, Rob, and George Athanasopoulos. 2021. Forecasting: Principles and Practice (3rd Ed). https://otexts.com/fpp3/.

Ivanova, Anna A, Shashank Srikant, Yotaro Sueoka, Hope H Kean, Riva Dhamala, Una-May O’Reilly, Marina U Bers, and Evelina Fedorenko. 2020. “Comprehension of Computer Code Relies Primarily on Domain-General Executive Brain Regions.” Edited by Andrea E Martin, Timothy E Behrens, William Matchin, and Ina Bornkessel-Schlesewsky. eLife 9 (December):e58906. https://doi.org/10.7554/eLife.58906.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning. Vol. 103. Springer Texts in Statistics. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-7138-7.

Jiang, Lili. 2020. “A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam).” Medium. https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c.

Koenker, Roger. 2005. Quantile Regression. Vol. 38. Cambridge university press. https://books.google.com/books?hl=en&lr=&id=WjOdAgAAQBAJ&oi=fnd&pg=PT12&dq=koenker+quantile+regression&ots=CQFHSt5o-W&sig=G1TpKPHo-BRdJ8qWcBrIBI2FQAs.

Kuhn, Max, and Kjell Johnson. 2023. Applied Machine Learning for Tabular Data. https://aml4td.org/.

Kuhn, Max, and Julia Silge. 2023. Tidy Modeling with R. https://www.tmwr.org/.

Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116 (10): 4156–65. https://doi.org/10.1073/pnas.1804597116.

LeCun, Yann, and Ishan Misra. 2021. “Self-Supervised Learning: The Dark Matter of Intelligence.” https://ai.meta.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/.

Leech, Gavin, Juan J. Vazquez, Misha Yagudin, Niclas Kupper, and Laurence Aitchison. 2024. “Questionable Practices in Machine Learning.” arXiv. https://doi.org/10.48550/arXiv.2407.12220.

LeNail, Alexander. 2024. “LeNet.” https://alexlenail.me/NN-SVG/LeNet.html.

Masis, Serg. 2023. “Interpretable Machine Learning with Python - Second Edition.” Packt. https://www.packtpub.com/product/interpretable-machine-learning-with-python-second-edition/9781803235424.

McCullagh, P. 2019. Generalized Linear Models. 2nd ed. New York: Routledge. https://doi.org/10.1201/9780203753736.

McCulloch, Warren S., and Walter Pitts. 1943. “A Logical Calculus of the Ideas Immanent in Nervous Activity.” The Bulletin of Mathematical Biophysics 5 (4): 115–33. https://doi.org/10.1007/BF02478259.

McElreath, Richard. 2020. Statistical Rethinking. Routledge & CRC Press. https://www.routledge.com/Statistical-Rethinking-A-Bayesian-Course-with-Examples-in-R-and-STAN/McElreath/p/book/9780367139919.

Molnar, Christoph. 2023. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/.

Molnar, Christoph. 2024. Introduction To Conformal Prediction With Python. https://christophmolnar.com/books/conformal-prediction/.

Murphy, Kevin P. 2012. Machine Learning: A Probabilistic Perspective. MIT Press. https://mitpress.mit.edu/9780262018029/machine-learning/.

Murphy, Kevin P. 2023. Probabilistic Machine Learning. MIT Press. https://mitpress.mit.edu/9780262046824/probabilistic-machine-learning/.

Navarro, Danielle. 2018. Learning Statistics with R. https://learningstatisticswithr.com.

Neal, Radford M. 1996. “Priors for Infinite Networks.” In Bayesian Learning for Neural Networks, edited by Radford M. Neal, 29–53. New York, NY: Springer. https://doi.org/10.1007/978-1-4612-0745-0_2.

Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized Linear Models.” Royal Statistical Society. Journal. Series A: General 135 (3): 370–84. https://doi.org/10.2307/2344614.

Niculescu-Mizil, Alexandru, and Rich Caruana. 2005. “Predicting Good Probabilities with Supervised Learning.” In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05, 625–32. Bonn, Germany: ACM Press. https://doi.org/10.1145/1102351.1102430.

Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12 (85): 2825–30. http://jmlr.org/papers/v12/pedregosa11a.html.

Power, Alethea, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. 2022. “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets.” arXiv. https://doi.org/10.48550/arXiv.2201.02177.

Raschka, Sebastian. 2014. “About Feature Scaling and Normalization.” https://sebastianraschka.com/Articles/2014_about_feature_scaling.html.

Raschka, Sebastian. 2022a. “Losses Learned.” https://sebastianraschka.com/blog/2022/losses-learned-part1.html.

Raschka, Sebastian. 2022b. Machine Learning with PyTorch and Scikit-Learn. https://sebastianraschka.com/books/machine-learning-with-pytorch-and-scikit-learn/.

Raschka, Sebastian. 2023a. Build a Large Language Model (From Scratch). https://www.manning.com/books/build-a-large-language-model-from-scratch.

Raschka, Sebastian. 2023b. Machine Learning Q and AI. https://nostarch.com/machine-learning-q-and-ai.

Rasmussen, Carl Edward, and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning. The MIT Press. https://doi.org/10.7551/mitpress/3206.001.0001.

Roback, Paul, and Julie Legler. 2021. Beyond Multiple Linear Regression. https://bookdown.org/roback/bookdown-BeyondMLR/.

Robins, J. M., M. A. Hernán, and B. Brumback. 2000. “Marginal Structural Models and Causal Inference in Epidemiology.” Epidemiology (Cambridge, Mass.) 11 (5): 550–60. https://doi.org/10.1097/00001648-200009000-00011.

Rocca, Baptiste. 2019. “Handling Imbalanced Datasets in Machine Learning.” Medium. https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28.

Rovine, Michael J, and Douglas R Anderson. 2004. “Peirce and Bowditch.” The American Statistician 58 (3): 232–36. https://doi.org/10.1198/000313004X964.

Schmidhuber, Juergen. 2022. “Annotated History of Modern AI and Deep Learning.” arXiv. https://doi.org/10.48550/arXiv.2212.11279.

Shalizi, Cosma. 2015. “F-Tests, R2, and Other Distractions.” https://www.stat.cmu.edu/~cshalizi/mreg/15/.

StatQuest with Josh Starmer. 2019a. “Gradient Descent, Step-by-Step.” https://www.youtube.com/watch?v=sDv4f4s2SB8.

StatQuest with Josh Starmer. 2019b. “Stochastic Gradient Descent, Clearly Explained!!!” https://www.youtube.com/watch?v=vMh0zPT0tLI.

StatQuest with Josh Starmer. 2021. “Bootstrapping Main Ideas!!!” https://www.youtube.com/watch?v=Xz0x-8-cgaQ.

UCLA Advanced Research Computing. 2023. “FAQ: What Are Pseudo R-Squareds?” https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/.

VanderWeele, Tyler J. 2012. “Invited Commentary: Structural Equation Models and Epidemiologic Analysis.” American Journal of Epidemiology 176 (7): 608. https://doi.org/10.1093/aje/kws213.

Vig, Jesse. 2019. “Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention.” Medium. https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1.

Walker, Kyle E. 2023. Analyzing US Census Data. https://walker-data.com/census-r.

Weed, Ethan, and Danielle Navarro. 2021. Learning Statistics with Python — Learning Statistics with Python. https://ethanweed.github.io/pythonbook/landingpage.html.

Wikipedia. 2023. “Relationships Among Probability Distributions.” Wikipedia. https://en.wikipedia.org/wiki/Relationships_among_probability_distributions.

Wood, Simon N. 2017. Generalized Additive Models: An Introduction with R, Second Edition. 2nd ed. Boca Raton: Chapman & Hall/CRC Press. https://doi.org/10.1201/9781315370279.

Wooldridge, Jeffrey M. 2012. Introductory Econometrics: A Modern Approach. 5th edition. Mason, OH: Cengage Learning.

Ye, Han-Jia, Huai-Hong Yin, and De-Chuan Zhan. 2024. “Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later.” arXiv. https://doi.org/10.48550/arXiv.2407.03257.

Zhang, Aston, Zack Lipton, Mu Li, and Alex Smola. 2023. “Dive into Deep Learning — Dive into Deep Learning 1.0.3 Documentation.” https://d2l.ai/index.html.