Appendix D — References
These references tend to be more functional than academic, and hopefully will be more practically useful to you as well. If you prefer additional academic resources, you’ll find some of those as well, but you can also look at the references within many of these for deeper or more formal dives, or just search Google Scholar for any of the topics covered.
3Blue1Brown. 2024. “How Large Language Models Work, a Visual Intro
to Transformers Chapter 5, Deep
Learning.” https://www.youtube.com/watch?v=wjZofJX0v4M.
Albon, Chris. 2024. “Machine Learning
Notes.” https://chrisalbon.com/Home.
Amazon. 2024. “What Is Data
Augmentation? - Data Augmentation
Techniques Explained -
AWS.” Amazon Web Services, Inc. https://aws.amazon.com/what-is/data-augmentation/.
Angelopoulos, Anastasios N., and Stephen Bates. 2022. “A
Gentle Introduction to Conformal
Prediction and Distribution-Free
Uncertainty Quantification.” arXiv. https://doi.org/10.48550/arXiv.2107.07511.
Barrett, Malcolm, Lucy D’Agostino McGowan, and Travis Gerke. 2024.
Causal Inference in R. https://www.r-causal.org/.
Bischl, Bernd, Raphael Sonabend, Lars Kotthoff, and Michel Lang, eds.
2024. Applied Machine Learning
Using Mlr3 in R. https://mlr3book.mlr-org.com/.
Boykis, Vicki. 2023. “What Are Embeddings?” http://vickiboykis.com/what_are_embeddings/index.html.
Brownlee, Jason. 2019. “A Gentle
Introduction to Imbalanced
Classification.”
MachineLearningMastery.com. https://machinelearningmastery.com/what-is-imbalanced-classification/.
Brownlee, Jason. 2021. “Gradient Descent
With AdaGrad From
Scratch.” MachineLearningMastery.com. https://machinelearningmastery.com/gradient-descent-with-adagrad-from-scratch/.
Bürkner, Paul-Christian, and Matti Vuorre. 2019. “Ordinal
Regression Models in Psychology:
A Tutorial.” Advances in Methods
and Practices in Psychological Science 2 (1): 77–101. https://doi.org/10.1177/2515245918823199.
Cawley, Gavin C., and Nicola L. C. Talbot. 2010. “On
Over-Fitting in Model Selection
and Subsequent Selection Bias in
Performance Evaluation.” The
Journal of Machine Learning Research 11 (August):2079–2107.
Chernozhukov, Victor, Christian Hansen, Nathan Kallus, Martin Spindler,
and Vasilis Syrgkanis. 2024. “Applied Causal
Inference Powered by ML and
AI.” arXiv. http://arxiv.org/abs/2403.02467.
Clark, Michael. 2018b. “Thinking about Latent
Variables.” https://m-clark.github.io/docs/FA_notes.html.
Clark, Michael. 2021b. “This Is Definitely Not All You
Need,” July. https://m-clark.github.io/posts/2021-07-15-dl-for-tabular/.
Clark, Michael. 2022a. “Deep Learning for
Tabular Data,” May. https://m-clark.github.io/posts/2022-04-01-more-dl-for-tabular/.
Clark, Michael. 2022b. Generalized Additive
Models. https://m-clark.github.io/generalized-additive-models/.
Clark, Michael. 2025. “Imbalanced Outcomes.”
https://m-clark.github.io/posts/2025-04-07-class-imbalance/.
Cohen, Jacob. 2009. Statistical Power Analysis for the Behavioral
Sciences. 2. ed., reprint. New York, NY: Psychology Press.
DataBricks. 2019. “What Is AdaGrad?”
Databricks. https://www.databricks.com/glossary/adagrad.
Davison, A. C., and D. V. Hinkley. 1997. Bootstrap
Methods and Their Application. Cambridge
Series in Statistical and
Probabilistic Mathematics. Cambridge:
Cambridge University Press. https://doi.org/10.1017/CBO9780511802843.
Dobson, Annette J., and Adrian G. Barnett. 2018. An
Introduction to Generalized
Linear Models. 4th ed. New York: Chapman
& Hall/CRC Press. https://doi.org/10.1201/9781315182780.
Dunn, Peter K., and Gordon K. Smyth. 2018. Generalized
Linear Models With
Examples in R. Springer.
Efron, Bradley, and R. J. Tibshirani. 1994. An
Introduction to the Bootstrap. New York:
Chapman & Hall/CRC Press. https://doi.org/10.1201/9780429246593.
Elor, Yotam, and Hadar Averbuch-Elor. 2022. “To
SMOTE, or Not to SMOTE?” arXiv. https://doi.org/10.48550/arXiv.2201.08528.
Facure Alves, Matheus. 2022. “Causal Inference for
The Brave and True —
Causal Inference for the Brave
and True.” https://matheusfacure.github.io/python-causality-handbook/landing-page.html.
Fahrmeir, Ludwig, Thomas Kneib, Stefan Lang, and Brian D. Marx. 2021.
Regression: Models, Methods and
Applications. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-662-63882-8.
Faraway, Julian. 2014. Linear Models with
R. Routledge & CRC Press. https://www.routledge.com/Linear-Models-with-R/Faraway/p/book/9781439887332.
Faraway, Julian J. 2016. Extending the Linear
Model with R: Generalized
Linear, Mixed Effects and
Nonparametric Regression Models,
Second Edition. 2nd ed. New York: Chapman
& Hall/CRC Press. https://doi.org/10.1201/9781315382722.
Fox, John. 2015. Applied Regression
Analysis and Generalized Linear
Models. SAGE Publications.
Gelman, Andrew. 2013. “What Are the Key Assumptions of Linear
Regression? Statistical
Modeling, Causal Inference, and
Social Science.” https://statmodeling.stat.columbia.edu/2013/08/04/19470/.
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki
Vehtari, and Donald B. Rubin. 2013. Bayesian Data
Analysis, Third Edition. CRC
Press.
Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using
Regression and Multilevel/Hierarchical Models. Cambridge University
Press.
Gelman, Andrew, Jennifer Hill, Ben Goodrich, Jonah Gabry, Daniel
Simpson, and Aki Vehtari. 2025. “Advanced Regression
and Multilevel Models.” http://www.stat.columbia.edu/~gelman/armm/.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and
Other Stories. 1st ed. Cambridge
University Press. https://doi.org/10.1017/9781139161879.
Google. 2023. “Machine Learning
Google for Developers.” https://developers.google.com/machine-learning.
Google. 2024. “MLOps: Continuous
Delivery and Automation Pipelines in Machine Learning
Cloud Architecture
Center.” Google Cloud. https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning.
Gorishniy, Yury, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii,
Akim Kotelnikov, and Artem Babenko. 2023. “TabR:
Tabular Deep Learning
Meets Nearest Neighbors in
2023.” arXiv. https://doi.org/10.48550/arXiv.2307.14338.
Greene, William. 2017. Econometric Analysis - 8th
Edition. https://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm.
Gruber, Cornelia, Patrick Oliver Schenk, Malte Schierholz, Frauke
Kreuter, and Göran Kauermann. 2023. “Sources of
Uncertainty in Machine Learning –
A Statisticians’ View.”
arXiv. https://doi.org/10.48550/arXiv.2305.16703.
Hardin, James W., and Joseph M. Hilbe. 2018. Generalized
Linear Models and
Extensions. Stata Press.
Harrell, Frank E. 2015. Regression Modeling
Strategies: With Applications to
Linear Models, Logistic and
Ordinal Regression, and Survival
Analysis. 2nd ed. Springer Series in
Statistics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-19425-7.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2017.
Elements of Statistical Learning: Data
Mining, Inference, and Prediction. 2nd Edition. https://hastie.su.domains/ElemStatLearn/.
Hernán, Miguel A. 2018. “The C-Word:
Scientific Euphemisms Do
Not Improve Causal
Inference From Observational
Data.” American Journal of Public Health
108 (5): 616–19. https://doi.org/10.2105/AJPH.2018.304337.
Howard, Jeremy. 2024. “Practical Deep
Learning for Coders - Practical
Deep Learning.” Practical Deep
Learning for Coders. https://course.fast.ai/.
Hyndman, Rob, and George Athanasopoulos. 2021. Forecasting:
Principles and Practice (3rd Ed). https://otexts.com/fpp3/.
Ivanova, Anna A, Shashank Srikant, Yotaro Sueoka, Hope H Kean, Riva
Dhamala, Una-May O’Reilly, Marina U Bers, and Evelina Fedorenko. 2020.
“Comprehension of Computer Code Relies Primarily on Domain-General
Executive Brain Regions.” Edited by Andrea E Martin, Timothy E
Behrens, William Matchin, and Ina Bornkessel-Schlesewsky. eLife
9 (December):e58906. https://doi.org/10.7554/eLife.58906.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
2021. An Introduction to Statistical
Learning. Vol. 103. Springer Texts in
Statistics. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-7138-7.
Jiang, Lili. 2020. “A Visual Explanation
of Gradient Descent Methods
(Momentum, AdaGrad, RMSProp,
Adam).” Medium. https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c.
Koenker, Roger. 2005. Quantile Regression. Vol. 38. Cambridge
university press. https://books.google.com/books?hl=en&lr=&id=WjOdAgAAQBAJ&oi=fnd&pg=PT12&dq=koenker+quantile+regression&ots=CQFHSt5o-W&sig=G1TpKPHo-BRdJ8qWcBrIBI2FQAs.
Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019.
“Metalearners for Estimating Heterogeneous Treatment Effects Using
Machine Learning.” Proceedings of the National Academy of
Sciences 116 (10): 4156–65. https://doi.org/10.1073/pnas.1804597116.
LeCun, Yann, and Ishan Misra. 2021. “Self-Supervised Learning:
The Dark Matter of Intelligence.” https://ai.meta.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/.
Leech, Gavin, Juan J. Vazquez, Misha Yagudin, Niclas Kupper, and
Laurence Aitchison. 2024. “Questionable Practices in Machine
Learning.” arXiv. https://doi.org/10.48550/arXiv.2407.12220.
LeNail, Alexander. 2024. “LeNet.” https://alexlenail.me/NN-SVG/LeNet.html.
Masis, Serg. 2023. “Interpretable Machine
Learning with Python - Second
Edition.” Packt. https://www.packtpub.com/product/interpretable-machine-learning-with-python-second-edition/9781803235424.
McCullagh, P. 2019. Generalized Linear
Models. 2nd ed. New York: Routledge. https://doi.org/10.1201/9780203753736.
McCulloch, Warren S., and Walter Pitts. 1943. “A Logical Calculus
of the Ideas Immanent in Nervous Activity.” The Bulletin of
Mathematical Biophysics 5 (4): 115–33. https://doi.org/10.1007/BF02478259.
McElreath, Richard. 2020. Statistical Rethinking.
Routledge & CRC Press. https://www.routledge.com/Statistical-Rethinking-A-Bayesian-Course-with-Examples-in-R-and-STAN/McElreath/p/book/9780367139919.
Molnar, Christoph. 2023. Interpretable Machine
Learning. https://christophm.github.io/interpretable-ml-book/.
Molnar, Christoph. 2024. Introduction To
Conformal Prediction With
Python. https://christophmolnar.com/books/conformal-prediction/.
Murphy, Kevin P. 2012. Machine Learning: A
Probabilistic Perspective. MIT Press. https://mitpress.mit.edu/9780262018029/machine-learning/.
Murphy, Kevin P. 2023. Probabilistic Machine
Learning. MIT Press. https://mitpress.mit.edu/9780262046824/probabilistic-machine-learning/.
Neal, Radford M. 1996. “Priors for Infinite
Networks.” In Bayesian Learning for
Neural Networks, edited by Radford M.
Neal, 29–53. New York, NY: Springer. https://doi.org/10.1007/978-1-4612-0745-0_2.
Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized
Linear Models.” Royal Statistical
Society. Journal. Series A: General 135 (3): 370–84. https://doi.org/10.2307/2344614.
Niculescu-Mizil, Alexandru, and Rich Caruana. 2005. “Predicting
Good Probabilities with Supervised Learning.” In Proceedings
of the 22nd International Conference on Machine Learning -
ICML ’05, 625–32. Bonn, Germany: ACM Press. https://doi.org/10.1145/1102351.1102430.
Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,
Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011.
“Scikit-Learn: Machine Learning in
Python.” Journal of Machine Learning
Research 12 (85): 2825–30. http://jmlr.org/papers/v12/pedregosa11a.html.
Power, Alethea, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant
Misra. 2022. “Grokking: Generalization
Beyond Overfitting on Small
Algorithmic Datasets.” arXiv. https://doi.org/10.48550/arXiv.2201.02177.
Raschka, Sebastian. 2014. “About Feature
Scaling and Normalization.” https://sebastianraschka.com/Articles/2014_about_feature_scaling.html.
Raschka, Sebastian. 2022a. “Losses Learned.”
https://sebastianraschka.com/blog/2022/losses-learned-part1.html.
Raschka, Sebastian. 2022b. Machine Learning with
PyTorch and Scikit-Learn. https://sebastianraschka.com/books/machine-learning-with-pytorch-and-scikit-learn/.
Raschka, Sebastian. 2023a. Build a Large
Language Model (From
Scratch). https://www.manning.com/books/build-a-large-language-model-from-scratch.
Raschka, Sebastian. 2023b. Machine Learning
Q and AI. https://nostarch.com/machine-learning-q-and-ai.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2005.
Gaussian Processes for Machine
Learning. The MIT Press. https://doi.org/10.7551/mitpress/3206.001.0001.
Roback, Paul, and Julie Legler. 2021. Beyond Multiple
Linear Regression. https://bookdown.org/roback/bookdown-BeyondMLR/.
Robins, J. M., M. A. Hernán, and B. Brumback. 2000. “Marginal
Structural Models and Causal Inference in Epidemiology.”
Epidemiology (Cambridge, Mass.) 11 (5): 550–60. https://doi.org/10.1097/00001648-200009000-00011.
Rocca, Baptiste. 2019. “Handling Imbalanced Datasets in Machine
Learning.” Medium. https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28.
Rovine, Michael J, and Douglas R Anderson. 2004. “Peirce and
Bowditch.” The American Statistician 58
(3): 232–36. https://doi.org/10.1198/000313004X964.
Schmidhuber, Juergen. 2022. “Annotated History of
Modern AI and Deep
Learning.” arXiv. https://doi.org/10.48550/arXiv.2212.11279.
Shalizi, Cosma. 2015. “F-Tests, R2, and
Other Distractions.” https://www.stat.cmu.edu/~cshalizi/mreg/15/.
StatQuest with Josh Starmer. 2019a. “Gradient
Descent, Step-by-Step.” https://www.youtube.com/watch?v=sDv4f4s2SB8.
StatQuest with Josh Starmer. 2019b. “Stochastic
Gradient Descent, Clearly
Explained!!!” https://www.youtube.com/watch?v=vMh0zPT0tLI.
StatQuest with Josh Starmer. 2021. “Bootstrapping
Main Ideas!!!” https://www.youtube.com/watch?v=Xz0x-8-cgaQ.
UCLA Advanced Research Computing. 2023. “FAQ:
What Are Pseudo R-Squareds?” https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/.
VanderWeele, Tyler J. 2012. “Invited Commentary:
Structural Equation Models and
Epidemiologic Analysis.” American
Journal of Epidemiology 176 (7): 608. https://doi.org/10.1093/aje/kws213.
Vig, Jesse. 2019. “Deconstructing BERT,
Part 2: Visualizing the Inner
Workings of Attention.”
Medium. https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1.
Weed, Ethan, and Danielle Navarro. 2021. Learning
Statistics with Python — Learning
Statistics with Python. https://ethanweed.github.io/pythonbook/landingpage.html.
Wikipedia. 2023. “Relationships Among Probability
Distributions.” Wikipedia. https://en.wikipedia.org/wiki/Relationships_among_probability_distributions.
Wood, Simon N. 2017. Generalized Additive
Models: An Introduction with
R, Second Edition. 2nd ed.
Boca Raton: Chapman & Hall/CRC Press. https://doi.org/10.1201/9781315370279.
Wooldridge, Jeffrey M. 2012. Introductory Econometrics:
A Modern Approach. 5th
edition. Mason, OH: Cengage Learning.
Ye, Han-Jia, Huai-Hong Yin, and De-Chuan Zhan. 2024. “Modern
Neighborhood Components Analysis:
A Deep Tabular
Baseline Two Decades
Later.” arXiv. https://doi.org/10.48550/arXiv.2407.03257.
Zhang, Aston, Zack Lipton, Mu Li, and Alex Smola. 2023. “Dive into
Deep Learning — Dive into
Deep Learning 1.0.3 Documentation.” https://d2l.ai/index.html.