Appendix E — References

These references tend to be more functional than academic, and hopefully will be more practically useful to you as well. If you prefer additional academic resources, you’ll find some of those as well, but you can also look at the references within many of these for deeper or more formal dives, or just search Google Scholar for any of the topics covered.

3Blue1Brown. 2024. “How Large Language Models Work, a Visual Intro to Transformers Chapter 5, Deep Learning.” https://www.youtube.com/watch?v=wjZofJX0v4M.
Agresti, Alan. 2015. Foundations of Linear and Generalized Linear Models. John Wiley & Sons.
Albon, Chris. 2024a. Machine Learning Flashcards. https://machinelearningflashcards.com/.
———. 2024b. “Machine Learning Notes.” https://chrisalbon.com/Home.
Amazon. 2024. “What Is Data Augmentation? - Data Augmentation Techniques Explained - AWS.” Amazon Web Services, Inc. https://aws.amazon.com/what-is/data-augmentation/.
Andrej Karpathy. 2024. “Let’s Build the GPT Tokenizer.” https://www.youtube.com/watch?v=zduSFxRajkE.
Angelopoulos, Anastasios N., and Stephen Bates. 2022. “A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.” arXiv. https://doi.org/10.48550/arXiv.2107.07511.
Arel-Bundock, Vincent. 2024. “Marginal Effects Zoo.” https://marginaleffects.com/.
Bai, Yu, Song Mei, Huan Wang, and Caiming Xiong. 2021. “Understanding the Under-Coverage Bias in Uncertainty Estimation.” arXiv. https://doi.org/10.48550/arXiv.2106.05515.
Barrett, Malcolm, Lucy D’Agostino McGowan, and Travis Gerke. 2024. Causal Inference in R. https://www.r-causal.org/.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019. “Reconciling Modern Machine Learning Practice and the Bias-Variance Trade-Off.” Proceedings of the National Academy of Sciences 116 (32): 15849–54. https://doi.org/10.1073/pnas.1903070116.
Bergmann, Dave. 2023. “What Is Self-Supervised Learning?” https://www.ibm.com/topics/self-supervised-learning.
Biecek, Przemyslaw, and Tomasz Burzykowski. 2020. Explanatory Model Analysis. https://ema.drwhy.ai/.
Bischl, Bernd, Raphael Sonabend, Lars Kotthoff, and Michel Lang, eds. 2024. Applied Machine Learning Using Mlr3 in R. https://mlr3book.mlr-org.com/.
Bishop, Christopher M. 2006. Pattern Recognition and Machine Learning. Information Science and Statistics. New York: Springer.
Boehmke, Bradley, and Greenwell, Brandon. 2020. Hands-On Machine Learning with R. https://bradleyboehmke.github.io/HOML/.
Boykis, Vicki. 2023. “What Are Embeddings?” http://vickiboykis.com/what_are_embeddings/index.html.
Breiman, Leo. 2001. “Statistical Modeling: The Two Cultures (with Comments and a Rejoinder by the Author).” Statistical Science 16 (3): 199–231. https://doi.org/10.1214/ss/1009213726.
Brownlee, Jason. 2016. “Gentle Introduction to the Bias-Variance Trade-Off in Machine Learning.” MachineLearningMastery.com. https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/.
———. 2019. “A Gentle Introduction to Imbalanced Classification.” MachineLearningMastery.com. https://machinelearningmastery.com/what-is-imbalanced-classification/.
———. 2020. “How to Calibrate Probabilities for Imbalanced Classification.” MachineLearningMastery.com. https://machinelearningmastery.com/probability-calibration-for-imbalanced-classification/.
———. 2021. “Gradient Descent With AdaGrad From Scratch.” MachineLearningMastery.com. https://machinelearningmastery.com/gradient-descent-with-adagrad-from-scratch/.
Burges, Chris, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, and Greg Hullender. 2005. “Learning to Rank Using Gradient Descent.” In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05, 89–96. Bonn, Germany: ACM Press. https://doi.org/10.1145/1102351.1102363.
Burges, Christopher J C. 2016. “From RankNet to LambdaRank to LambdaMART: An Overview.”
Bürkner, Paul-Christian, and Matti Vuorre. 2019. “Ordinal Regression Models in Psychology: A Tutorial.” Advances in Methods and Practices in Psychological Science 2 (1): 77–101. https://doi.org/10.1177/2515245918823199.
Bycroft, Brendan. 2023. LLM Visualization.” https://bbycroft.net/llm.
Carpenter, Bob. 2023. “Prior Choice Recommendations.” GitHub. https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations.
causalml. 2023. CausalML.” https://causalml.readthedocs.io/en/latest/index.html.
Cawley, Gavin C., and Nicola L. C. Talbot. 2010. “On Over-Fitting in Model Selection and Subsequent Selection Bias in Performance Evaluation.” The Journal of Machine Learning Research 11 (August): 2079–2107.
Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002. SMOTE: Synthetic Minority Over-Sampling Technique.” Journal of Artificial Intelligence Research 16 (June): 321–57. https://doi.org/10.1613/jair.953.
Chernozhukov, Victor, Christian Hansen, Nathan Kallus, Martin Spindler, and Vasilis Syrgkanis. 2024. “Applied Causal Inference Powered by ML and AI.” arXiv. http://arxiv.org/abs/2403.02467.
Clark, Michael. 2018a. Graphical & Latent Variable Modeling. https://m-clark.github.io/sem/.
———. 2018b. “Thinking about Latent Variables.” https://m-clark.github.io/docs/FA_notes.html.
———. 2020. Practical Data Science. https://m-clark.github.io/data-processing-and-visualization/.
———. 2021a. Model Estimation by Example. https://m-clark.github.io/models-by-example/.
———. 2021b. “This Is Definitely Not All You Need,” July. https://m-clark.github.io/posts/2021-07-15-dl-for-tabular/.
———. 2022a. Bayesian Basics. https://m-clark.github.io/bayesian-basics/.
———. 2022b. Generalized Additive Models. https://m-clark.github.io/generalized-additive-models/.
———. 2022c. “Deep Learning for Tabular Data,” May. https://m-clark.github.io/posts/2022-04-01-more-dl-for-tabular/.
———. 2023. Mixed Models with R. https://m-clark.github.io/mixed-models-with-R/.
Cohen, Jacob. 2009. Statistical Power Analysis for the Behavioral Sciences. 2. ed., reprint. New York, NY: Psychology Press.
Cross Validated. 2011. “Answer to "The Connection Between Bayesian Statistics and Generative Modeling".” Cross Validated. https://stats.stackexchange.com/a/7473.
———. 2016. “Why Are Neural Networks Becoming Deeper, but Not Wider?” Forum post. Cross Validated. https://stats.stackexchange.com/q/222883.
———. 2020. “Answer to "Why Some Algorithms Produce Calibrated Probabilities".” Cross Validated. https://stats.stackexchange.com/a/452533.
———. 2021. “Answer to "Why Do We Do Matching for Causal Inference Vs Regressing on Confounders?".” Cross Validated. https://stats.stackexchange.com/a/544958.
Cunningham, Scott. 2023. Causal Inference The Mixtape. https://mixtape.scunning.com/.
Dahabreh, Issa J., and Kirsten Bibbins-Domingo. 2024. “Causal Inference About the Effects of Interventions From Observational Studies in Medical Journals.” JAMA, May. https://doi.org/10.1001/jama.2024.7741.
DataBricks. 2019. “What Is AdaGrad?” Databricks. https://www.databricks.com/glossary/adagrad.
Davison, A. C., and D. V. Hinkley. 1997. Bootstrap Methods and Their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511802843.
Dobson, Annette J., and Adrian G. Barnett. 2018. An Introduction to Generalized Linear Models. 4th ed. New York: Chapman; Hall/CRC. https://doi.org/10.1201/9781315182780.
Dunn, Peter K., and Gordon K. Smyth. 2018. Generalized Linear Models With Examples in R. Springer.
Dunn, Robin, Larry Wasserman, and Aaditya Ramdas. 2020. Distribution-Free Prediction Sets with Random Effects.
Efron, Bradley, and R. J. Tibshirani. 1994. An Introduction to the Bootstrap. New York: Chapman; Hall/CRC. https://doi.org/10.1201/9780429246593.
Elor, Yotam, and Hadar Averbuch-Elor. 2022. “To SMOTE, or Not to SMOTE?” arXiv. https://doi.org/10.48550/arXiv.2201.08528.
Face, Hugging. 2024. “Byte-Pair Encoding Tokenization - Hugging Face NLP Course.” https://huggingface.co/learn/nlp-course/en/chapter6/5.
Facure Alves, Matheus. 2022. “Causal Inference for The Brave and TrueCausal Inference for the Brave and True.” https://matheusfacure.github.io/python-causality-handbook/landing-page.html.
Fahrmeir, Ludwig, Thomas Kneib, Stefan Lang, and Brian D. Marx. 2021. Regression: Models, Methods and Applications. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-662-63882-8.
Faraway, Julian. 2014. “Linear Models with R.” Routledge & CRC Press. https://www.routledge.com/Linear-Models-with-R/Faraway/p/book/9781439887332.
Faraway, Julian J. 2016. Extending the Linear Model with R: Generalized Linear, Mixed Effects and Nonparametric Regression Models, Second Edition. 2nd ed. New York: Chapman; Hall/CRC. https://doi.org/10.1201/9781315382722.
Ferrari, Silvia, and Francisco Cribari-Neto. 2004. “Beta Regression for Modelling Rates and Proportions.” Journal of Applied Statistics 31 (7): 799–815. https://doi.org/10.1080/0266476042000214501.
Fleuret, François. 2023. The Little Book of Deep Learning. https://fleuret.org/francois/lbdl.html.
Fortuner, Brendan. 2023. “Machine Learning Glossary.” https://ml-cheatsheet.readthedocs.io/en/latest/index.html.
Fox, John. 2015. Applied Regression Analysis and Generalized Linear Models. SAGE Publications.
Gelman, Andrew. 2013. “What Are the Key Assumptions of Linear Regression? Statistical Modeling, Causal Inference, and Social Science.” https://statmodeling.stat.columbia.edu/2013/08/04/19470/.
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki Vehtari, and Donald B. Rubin. 2013. Bayesian Data Analysis, Third Edition. CRC Press.
Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge university press.
Gelman, Andrew, Jennifer Hill, Ben Goodrich, Jonah Gabry, Daniel Simpson, and Aki Vehtari. 2024. “Advanced Regression and Multilevel Models.” http://www.stat.columbia.edu/~gelman/armm/.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and Other Stories. 1st ed. Cambridge University Press. https://doi.org/10.1017/9781139161879.
Gelman, Andrew, and Eric Loken. 2013. “The Garden of Forking Paths: Why Multiple Comparisons Can Be a Problem, Even When There Is No ‘FIshing Expedition’ or ‘p-Hacking’ and the Research Hypothesis Was Posited Ahead of Time.”
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. https://www.deeplearningbook.org/.
Google. 2023a. “Imbalanced Data Machine Learning.” Google for Developers. https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data.
———. 2023b. “Introduction Machine Learning.” Google for Developers. https://developers.google.com/machine-learning/decision-forests.
———. 2023c. “Machine Learning Google for Developers.” https://developers.google.com/machine-learning.
———. 2024a. “Classification: ROC Curve and AUC Machine Learning.” Google for Developers. https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc.
———. 2024b. MLOps: Continuous Delivery and Automation Pipelines in Machine Learning Cloud Architecture Center.” Google Cloud. https://cloud.google.com/architecture/mlops-continuous-delivery-and-automation-pipelines-in-machine-learning.
———. 2024c. “Reducing Loss: Gradient Descent Machine Learning.” Google for Developers. https://developers.google.com/machine-learning/crash-course/reducing-loss/gradient-descent.
———. 2024d. “What Is Unsupervised Learning?” https://cloud.google.com/discover/what-is-unsupervised-learning.
Gorishniy, Yury, Ivan Rubachev, Nikolay Kartashev, Daniil Shlenskii, Akim Kotelnikov, and Artem Babenko. 2023. TabR: Tabular Deep Learning Meets Nearest Neighbors in 2023.” arXiv. https://doi.org/10.48550/arXiv.2307.14338.
Greene, William. 2017. Econometric Analysis - 8th Edition. https://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm.
Grolemund, Hadley Wickham and Garrett. 2023. Welcome R for Data Science. https://r4ds.hadley.nz/.
Hardin, James W., and Joseph M. Hilbe. 2018. Generalized Linear Models and Extensions. Stata Press.
Harrell, Frank E. 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis. 2nd ed. Springer Series in Statistics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-19425-7.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2017. Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd Edition. https://hastie.su.domains/ElemStatLearn/.
Heiss, Andrew. 2022. “Marginalia: A Guide to Figuring Out What the Heck Marginal Effects, Marginal Slopes, Average Marginal Effects, Marginal Effects at the Mean, and All These Other Marginal Things Are.” Andrew Heiss. https://www.andrewheiss.com/blog/2022/05/20/marginalia/#what-are-marginal-effects.
Hernán, Miguel A. 2018. “The C-Word: Scientific Euphemisms Do Not Improve Causal Inference From Observational Data.” American Journal of Public Health 108 (5): 616–19. https://doi.org/10.2105/AJPH.2018.304337.
Hernán, Miguel A., and James M. Robins. 2012. Causal Inference: What If (the Book). https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/.
Howard, Jeremy. 2024. “Practical Deep Learning for Coders - Practical Deep Learning.” Practical Deep Learning for Coders. https://course.fast.ai/.
Hvitfeldt, Emil. 2024. “Feature Engineering A-Z Preface.” Feature Engineering A-Z. https://feaz-book.com/.
Hyndman, Rob, and George Athanasopoulos. 2021. Forecasting: Principles and Practice (3rd Ed). https://otexts.com/fpp3/.
Ivanova, Anna A, Shashank Srikant, Yotaro Sueoka, Hope H Kean, Riva Dhamala, Una-May O’Reilly, Marina U Bers, and Evelina Fedorenko. 2020. “Comprehension of Computer Code Relies Primarily on Domain-General Executive Brain Regions.” Edited by Andrea E Martin, Timothy E Behrens, William Matchin, and Ina Bornkessel-Schlesewsky. eLife 9 (December): e58906. https://doi.org/10.7554/eLife.58906.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2021. An Introduction to Statistical Learning. Vol. 103. Springer Texts in Statistics. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-7138-7.
Jiang, Lili. 2020. “A Visual Explanation of Gradient Descent Methods (Momentum, AdaGrad, RMSProp, Adam).” Medium. https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c.
Jordan, Jeremy. 2018. “Introduction to Autoencoders.” Jeremy Jordan. https://www.jeremyjordan.me/autoencoders/.
Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, et al. 2023. “Segment Anything.” In, 4015–26. https://openaccess.thecvf.com/content/ICCV2023/html/Kirillov_Segment_Anything_ICCV_2023_paper.html.
Koenker, Roger. 2000. “Galton, Edgeworth, Frisch, and Prospects for Quantile Regression in Econometrics.” Journal of Econometrics 95 (2): 347–74. https://doi.org/10.1016/S0304-4076(99)00043-3.
———. 2005. Quantile Regression. Vol. 38. Cambridge university press. https://books.google.com/books?hl=en&lr=&id=WjOdAgAAQBAJ&oi=fnd&pg=PT12&dq=koenker+quantile+regression&ots=CQFHSt5o-W&sig=G1TpKPHo-BRdJ8qWcBrIBI2FQAs.
Kruschke, John. 2010. Doing Bayesian Data Analysis: A Tutorial Introduction with R. Academic Press.
Kuhn, Max, and Kjell Johnson. 2023. Applied Machine Learning for Tabular Data. https://aml4td.org/.
Kuhn, Max, and Julia Silge. 2023. Tidy Modeling with R. https://www.tmwr.org/.
Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019. “Metalearners for Estimating Heterogeneous Treatment Effects Using Machine Learning.” Proceedings of the National Academy of Sciences 116 (10): 4156–65. https://doi.org/10.1073/pnas.1804597116.
Lang, Michel, Martin Binder, Jakob Richter, Patrick Schratz, Florian Pfisterer, Stefan Coors, Quay Au, Giuseppe Casalicchio, Lars Kotthoff, and Bernd Bischl. 2019. “Mlr3: A Modern Object-Oriented Machine Learning Framework in R.” Journal of Open Source Software 4 (44): 1903. https://doi.org/10.21105/joss.01903.
LeCun, Yann, and Ishan Misra. 2021. “Self-Supervised Learning: The Dark Matter of Intelligence.” https://ai.meta.com/blog/self-supervised-learning-the-dark-matter-of-intelligence/.
Lee, Jaehoon, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey Pennington, and Jascha Sohl-Dickstein. 2017. “Deep Neural Networks as Gaussian Processes.” arXiv.org. https://arxiv.org/abs/1711.00165v3.
Leech, Gavin, Juan J. Vazquez, Misha Yagudin, Niclas Kupper, and Laurence Aitchison. 2024. “Questionable Practices in Machine Learning.” arXiv. https://doi.org/10.48550/arXiv.2407.12220.
Lones, Michael A. 2024. “How to Avoid Machine Learning Pitfalls: A Guide for Academic Researchers.” arXiv. https://doi.org/10.48550/arXiv.2108.02497.
Mahr, Tristan. 2021. “Random Effects and Penalized Splines Are the Same Thing.” Higher Order Functions. https://tjmahr.github.io/random-effects-penalized-splines-same-thing/.
Masis, Serg. 2023. “Interpretable Machine Learning with Python - Second Edition.” Packt. https://www.packtpub.com/product/interpretable-machine-learning-with-python-second-edition/9781803235424.
Mayo, Deborah. 2019. “Error Statistics Philosophy.” Error Statistics Philosophy. https://errorstatistics.com/.
McCullagh, P. 2019. Generalized Linear Models. 2nd ed. New York: Routledge. https://doi.org/10.1201/9780203753736.
McCulloch, Warren S., and Walter Pitts. 1943. “A Logical Calculus of the Ideas Immanent in Nervous Activity.” The Bulletin of Mathematical Biophysics 5 (4): 115–33. https://doi.org/10.1007/BF02478259.
McElreath, Richard. 2020. “Statistical Rethinking: A Bayesian Course with Examples in R and STAN.” Routledge & CRC Press. https://www.routledge.com/Statistical-Rethinking-A-Bayesian-Course-with-Examples-in-R-and-STAN/McElreath/p/book/9780367139919.
McKinney, Wes. 2023. Python for Data Analysis. 3rd ed. https://wesmckinney.com/book/.
Microsoft. 2024. “Generative AI for Beginners.” https://microsoft.github.io/generative-ai-for-beginners/#/.
MIT OpenCourseWare. 2017. “6. Monte Carlo Simulation.” https://www.youtube.com/watch?v=OgO1gpXSUzU.
Molnar, Christoph. 2023. Interpretable Machine Learning. https://christophm.github.io/interpretable-ml-book/.
———. 2024. Introduction To Conformal Prediction With Python. https://christophmolnar.com/books/conformal-prediction/.
Monroe, Elizabeth, and Michael Clark. 2025. “Imbalanced Outcomes: Challenges and Solutions.”
Morgan, Stephen, and Christopher Winship. 2014. “Counterfactuals and Causal Inference: Methods and Principles for Social Research, 2nd Edition,” January. https://stars.library.ucf.edu/etextbooks/298.
Murphy, Kevin P. 2012. “Machine Learning: A Probabilistic Perspective.” MIT Press. https://mitpress.mit.edu/9780262018029/machine-learning/.
———. 2023. “Probabilistic Machine Learning.” MIT Press. https://mitpress.mit.edu/9780262046824/probabilistic-machine-learning/.
Navarro, Danielle. 2018. Learning Statistics with R. https://learningstatisticswithr.com.
Neal, Radford M. 1996. “Priors for Infinite Networks.” In Bayesian Learning for Neural Networks, edited by Radford M. Neal, 29–53. New York, NY: Springer. https://doi.org/10.1007/978-1-4612-0745-0_2.
Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized Linear Models.” Royal Statistical Society. Journal. Series A: General 135 (3): 370–84. https://doi.org/10.2307/2344614.
Niculescu-Mizil, Alexandru, and Rich Caruana. 2005. “Predicting Good Probabilities with Supervised Learning.” In Proceedings of the 22nd International Conference on Machine Learning - ICML ’05, 625–32. Bonn, Germany: ACM Press. https://doi.org/10.1145/1102351.1102430.
Pearl, Judea. 2009. “Causal Inference in Statistics: An Overview.” Statistics Surveys 3 (none): 96–146. https://doi.org/10.1214/09-SS057.
———. 2022. “Causal Inference: History, Perspectives, Adventures, and Unification (An Interview with Judea Pearl).” https://muse.jhu.edu/pub/56/article/867087/summary.
Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011. “Scikit-Learn: Machine Learning in Python.” Journal of Machine Learning Research 12 (85): 2825–30. http://jmlr.org/papers/v12/pedregosa11a.html.
Peng, Roger D. 2022. R Programming for Data Science. https://bookdown.org/rdpeng/rprogdatascience/.
Penn State, Department of Statistics. 2018. “5.4 - A Matrix Formulation of the Multiple Regression Model STAT 462.” https://online.stat.psu.edu/stat462/node/132/.
Pochinkov, Nicky. 2023. LLM Basics: Embedding Spaces - Transformer Token Vectors Are Not Points in Space,” February. https://www.lesswrong.com/posts/pHPmMGEMYefk9jLeh/llm-basics-embedding-spaces-transformer-token-vectors-are.
Pok, Wilson. 2020. “How Uplift Modeling Works Blogs.” https://ambiata.com/blog/2020-07-07-uplift-modeling/.
Power, Alethea, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant Misra. 2022. “Grokking: Generalization Beyond Overfitting on Small Algorithmic Datasets.” arXiv. https://doi.org/10.48550/arXiv.2201.02177.
Prince, Simon J. D. 2023. Understanding Deep Learning. MIT Press.
Quantmetry. 2024. MAPIE - Model Agnostic Prediction Interval EstimatorMAPIE 0.8.2 Documentation.” https://mapie.readthedocs.io/en/latest/.
Raschka, Sebastian. 2014. “About Feature Scaling and Normalization.” Sebastian Raschka, PhD. https://sebastianraschka.com/Articles/2014_about_feature_scaling.html.
———. 2022a. “Losses Learned.” Sebastian Raschka, PhD. https://sebastianraschka.com/blog/2022/losses-learned-part1.html.
———. 2022b. Machine Learning with PyTorch and Scikit-Learn. https://sebastianraschka.com/books/machine-learning-with-pytorch-and-scikit-learn/.
———. 2023a. Build a Large Language Model (From Scratch). https://www.manning.com/books/build-a-large-language-model-from-scratch.
———. 2023b. Machine Learning Q and AI. https://nostarch.com/machine-learning-q-and-ai.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2005. Gaussian Processes for Machine Learning. The MIT Press. https://doi.org/10.7551/mitpress/3206.001.0001.
Ripley, Brian D. 1996. Pattern Recognition and Neural Networks. Cambridge: Cambridge University Press. https://doi.org/10.1017/CBO9780511812651.
Roback, Paul, and Julie Legler. 2021. Beyond Multiple Linear Regression. https://bookdown.org/roback/bookdown-BeyondMLR/.
Roberts, Eric. 2000. “Neural Networks - History.” https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html.
Robins, J. M., M. A. Hernán, and B. Brumback. 2000. “Marginal Structural Models and Causal Inference in Epidemiology.” Epidemiology (Cambridge, Mass.) 11 (5): 550–60. https://doi.org/10.1097/00001648-200009000-00011.
Rocca, Baptiste. 2019. “Handling Imbalanced Datasets in Machine Learning.” Medium. https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28.
Rovine, Michael J, and Douglas R Anderson. 2004. “Peirce and Bowditch.” The American Statistician 58 (3): 232–36. https://doi.org/10.1198/000313004X964.
Schmidhuber, Juergen. 2022. “Annotated History of Modern AI and Deep Learning.” arXiv. https://doi.org/10.48550/arXiv.2212.11279.
scikit-learn. 2023a. “1.16. Probability Calibration.” Scikit-Learn. https://scikit-learn/stable/modules/calibration.html.
———. 2023b. “Nested Versus Non-Nested Cross-Validation.” Scikit-Learn. https://scikit-learn/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html.
Sen, Rajat, and Yichen Zhou. 2024. “A Decoder-Only Foundation Model for Time-Series Forecasting.” http://research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting/.
Shalizi, Cosma. 2015. “F-Tests, R2, and Other Distractions.” https://www.stat.cmu.edu/~cshalizi/mreg/15/.
Shevchenko, Maksim. 2023. “Types of Customers — Scikit-Uplift 0.3.1 Documentation.” https://www.uplift-modeling.com/en/v0.5.1/user_guide/introduction/clients.html.
Shorten, Connor, Taghi M. Khoshgoftaar, and Borko Furht. 2021. “Text Data Augmentation for Deep Learning.” Journal of Big Data 8 (1): 101. https://doi.org/10.1186/s40537-021-00492-0.
Simpson, Gavin. 2021. “Using Random Effects in GAMs with Mgcv.” From the Bottom of the Heap, February. https://www.fromthebottomoftheheap.net/2021/02/02/random-effects-in-gams/.
StackExchange. 2015. “Are There Any Differences Between Tensors and Multidimensional Arrays?” Forum post. Mathematics Stack Exchange. https://math.stackexchange.com/q/1134809.
StatQuest with Josh Starmer. 2019a. “Gradient Descent, Step-by-Step.” https://www.youtube.com/watch?v=sDv4f4s2SB8.
———. 2019b. “Stochastic Gradient Descent, Clearly Explained!!!” https://www.youtube.com/watch?v=vMh0zPT0tLI.
———. 2021. “Bootstrapping Main Ideas!!!” https://www.youtube.com/watch?v=Xz0x-8-cgaQ.
Turrell, Arthur, Pietro Monticone, Zeki Akyol, and Yiben Huang. 2024. “Python for Data Science V1.0.1,” January. https://doi.org/10.5281/ZENODO.10518241.
UCLA Advanced Research Computing. 2023a. FAQ: How Do I Interpret Odds Ratios in Logistic Regression?” https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-how-do-i-interpret-odds-ratios-in-logistic-regression/.
———. 2023b. FAQ: What Are Pseudo R-Squareds?” https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/.
Ushey, Kevin, JJ Allaire, and Tang Yuan. 2024. “Arrays in R and Python.” https://cran.r-project.org/web/packages/reticulate/vignettes/arrays.html.
VanderPlas, Jake. 2016. “Python Data Science Handbook [Book].” https://www.oreilly.com/library/view/python-data-science/9781491912126/.
VanderWeele, Tyler J. 2012. “Invited Commentary: Structural Equation Models and Epidemiologic Analysis.” American Journal of Epidemiology 176 (7): 608. https://doi.org/10.1093/aje/kws213.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017. “Attention Is All You Need.” In Advances in Neural Information Processing Systems. Vol. 30. Curran Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
Vig, Jesse. 2019. “Deconstructing BERT, Part 2: Visualizing the Inner Workings of Attention.” Medium. https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1.
Walker, Kyle E. 2023. Analyzing US Census Data. https://walker-data.com/census-r.
Weed, Ethan, and Danielle Navarro. 2021. Learning Statistics with PythonLearning Statistics with Python. https://ethanweed.github.io/pythonbook/landingpage.html.
Welchowski, Thomas, Kelly O. Maloney, Richard Mitchell, and Matthias Schmid. 2022. “Techniques to Improve Ecological Interpretability of Black-Box Machine Learning Models.” Journal of Agricultural, Biological and Environmental Statistics 27 (1): 175–97. https://doi.org/10.1007/s13253-021-00479-7.
Wikipedia. 2023. “Relationships Among Probability Distributions.” Wikipedia. https://en.wikipedia.org/wiki/Relationships_among_probability_distributions.
———. 2024a. “Exponential Family.” Wikipedia. https://en.wikipedia.org/w/index.php?title=Exponential_family&oldid=1202463189.
———. 2024b. “Gradient.” Wikipedia. https://en.wikipedia.org/w/index.php?title=Gradient&oldid=1206147282.
———. 2024c. “Cross-Entropy.” Wikipedia. https://en.wikipedia.org/w/index.php?title=Cross-entropy&oldid=1221840853#Cross-entropy_loss_function_and_logistic_regression.
———. 2024d. “Replication Crisis.” Wikipedia. https://en.wikipedia.org/w/index.php?title=Replication_crisis&oldid=1222335234.
Witten, Daniela. 2020. “The Bias-Variance Trade-Off & "DOUBLE DESCENT".” X (Formerly Twitter). https://x.com/daniela_witten/status/1292293102103748609.
Wood, Simon N. 2017. Generalized Additive Models: An Introduction with R, Second Edition. 2nd ed. Boca Raton: Chapman; Hall/CRC. https://doi.org/10.1201/9781315370279.
Wooldridge, Jeffrey M. 2012. Introductory Econometrics: A Modern Approach. 5th edition. Mason, OH: Cengage Learning.
Ye, Han-Jia, Huai-Hong Yin, and De-Chuan Zhan. 2024. “Modern Neighborhood Components Analysis: A Deep Tabular Baseline Two Decades Later.” arXiv. https://doi.org/10.48550/arXiv.2407.03257.
Yeh, Tom. 2024. AI by Hand ✍️ Tom Yeh Substack.” https://aibyhand.substack.com/.
Zhang, Aston, Zack Lipton, Mu Li, and Alex Smola. 2023. “Dive into Deep LearningDive into Deep Learning 1.0.3 Documentation.” https://d2l.ai/index.html.