Appendix D — References
These references tend to be more functional than academic, and hopefully will be more practically useful to you as well. If you prefer additional academic resources, you can always look at the references within these, or just search Google Scholar for any of the topics covered.
Agresti, Alan. 2015. Foundations of Linear and
Generalized Linear Models.
John Wiley & Sons.
———. 2024b. “Machine Learning
Notes.” https://chrisalbon.com/Home.
Amazon. 2024. “What Is Data
Augmentation? - Data Augmentation
Techniques Explained -
AWS.” Amazon Web Services, Inc. https://aws.amazon.com/what-is/data-augmentation/.
Andrej Karpathy. 2024. “Let’s Build the GPT
Tokenizer.” https://www.youtube.com/watch?v=zduSFxRajkE.
Angelopoulos, Anastasios N., and Stephen Bates. 2022. “A
Gentle Introduction to Conformal
Prediction and Distribution-Free
Uncertainty Quantification.” arXiv. https://doi.org/10.48550/arXiv.2107.07511.
Arel-Bundock, Vincent. 2024. “Marginal Effects
Zoo.” https://marginaleffects.com/.
Bai, Yu, Song Mei, Huan Wang, and Caiming Xiong. 2021.
“Understanding the Under-Coverage
Bias in Uncertainty
Estimation.” arXiv. https://doi.org/10.48550/arXiv.2106.05515.
Barrett, Malcolm, Lucy D’Agostino McGowan, and Travis Gerke. 2024.
Causal Inference in R. https://www.r-causal.org/.
Belkin, Mikhail, Daniel Hsu, Siyuan Ma, and Soumik Mandal. 2019.
“Reconciling Modern Machine Learning Practice and the
Bias-Variance Trade-Off.” Proceedings of the National Academy
of Sciences 116 (32): 15849–54. https://doi.org/10.1073/pnas.1903070116.
Biecek, Przemyslaw, and Tomasz Burzykowski. 2020. Explanatory
Model Analysis. https://ema.drwhy.ai/.
Bischl, Bernd, Raphael Sonabend, Lars Kotthoff, and Michel Lang, eds.
2024. Applied Machine Learning
Using Mlr3 in R. https://mlr3book.mlr-org.com/.
Bishop, Christopher M. 2006. Pattern Recognition and Machine
Learning. Information Science and Statistics. New York: Springer.
Boehmke, Bradley, and Greenwell, Brandon. 2020.
Hands-On Machine Learning
with R. https://bradleyboehmke.github.io/HOML/.
Boykis, Vicki. 2023. “What Are Embeddings?” http://vickiboykis.com/what_are_embeddings/index.html.
Brownlee, Jason. 2016. “Gentle Introduction to the
Bias-Variance
Trade-Off in Machine
Learning.” MachineLearningMastery.com. https://machinelearningmastery.com/gentle-introduction-to-the-bias-variance-trade-off-in-machine-learning/.
———. 2019. “A Gentle Introduction to
Imbalanced Classification.”
MachineLearningMastery.com. https://machinelearningmastery.com/what-is-imbalanced-classification/.
———. 2020. “How to Calibrate
Probabilities for Imbalanced
Classification.”
MachineLearningMastery.com. https://machinelearningmastery.com/probability-calibration-for-imbalanced-classification/.
———. 2021. “Gradient Descent With
AdaGrad From Scratch.”
MachineLearningMastery.com. https://machinelearningmastery.com/gradient-descent-with-adagrad-from-scratch/.
Burges, Chris, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole
Hamilton, and Greg Hullender. 2005. “Learning to Rank Using
Gradient Descent.” In Proceedings of the 22nd International
Conference on Machine Learning - ICML
’05, 89–96. Bonn, Germany: ACM Press. https://doi.org/10.1145/1102351.1102363.
Burges, Christopher J C. 2016. “From RankNet to
LambdaRank to LambdaMART: An
Overview.”
Bürkner, Paul-Christian, and Matti Vuorre. 2019. “Ordinal
Regression Models in Psychology:
A Tutorial.” Advances in Methods
and Practices in Psychological Science 2 (1): 77–101. https://doi.org/10.1177/2515245918823199.
Bycroft, Brendan. 2023. “LLM
Visualization.” https://bbycroft.net/llm.
Carpenter, Bob. 2023. “Prior Choice
Recommendations.” GitHub. https://github.com/stan-dev/stan/wiki/Prior-Choice-Recommendations.
causalml. 2023. “CausalML.” https://causalml.readthedocs.io/en/latest/index.html.
Cawley, Gavin C., and Nicola L. C. Talbot. 2010. “On
Over-Fitting in Model Selection
and Subsequent Selection Bias in
Performance Evaluation.” The
Journal of Machine Learning Research 11 (August): 2079–2107.
Chawla, N. V., K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer. 2002.
“SMOTE: Synthetic Minority
Over-Sampling Technique.” Journal
of Artificial Intelligence Research 16 (June): 321–57. https://doi.org/10.1613/jair.953.
Chernozhukov, Victor, Christian Hansen, Nathan Kallus, Martin Spindler,
and Vasilis Syrgkanis. 2024. “Applied Causal
Inference Powered by ML and
AI.” arXiv. http://arxiv.org/abs/2403.02467.
———. 2018b. “Thinking about Latent
Variables.” https://m-clark.github.io/docs/FA_notes.html.
———. 2021b. “This Is Definitely Not All You Need,” July. https://m-clark.github.io/posts/2021-07-15-dl-for-tabular/.
———. 2022a. Bayesian Basics. https://m-clark.github.io/bayesian-basics/.
———. 2022c. “Deep Learning for Tabular
Data,” May. https://m-clark.github.io/posts/2022-04-01-more-dl-for-tabular/.
Cohen, Jacob. 2009. Statistical Power Analysis for the Behavioral
Sciences. 2. ed., reprint. New York, NY: Psychology Press.
Cross Validated. 2011. “Answer to "The Connection
Between Bayesian Statistics and Generative
Modeling".” Cross Validated. https://stats.stackexchange.com/a/7473.
———. 2016. “Why Are Neural Networks Becoming Deeper, but Not
Wider?” Forum post. Cross Validated. https://stats.stackexchange.com/q/222883.
———. 2020. “Answer to "Why Some
Algorithms Produce Calibrated
Probabilities".” Cross Validated. https://stats.stackexchange.com/a/452533.
———. 2021. “Answer to "Why Do We Do Matching for
Causal Inference Vs Regressing on Confounders?".” Cross
Validated. https://stats.stackexchange.com/a/544958.
Dahabreh, Issa J., and Kirsten Bibbins-Domingo. 2024. “Causal
Inference About the Effects of
Interventions From Observational
Studies in Medical
Journals.” JAMA, May. https://doi.org/10.1001/jama.2024.7741.
DataBricks. 2019. “What Is AdaGrad?”
Databricks. https://www.databricks.com/glossary/adagrad.
Davison, A. C., and D. V. Hinkley. 1997. Bootstrap
Methods and Their Application. Cambridge
Series in Statistical and
Probabilistic Mathematics. Cambridge:
Cambridge University Press. https://doi.org/10.1017/CBO9780511802843.
Dobson, Annette J., and Adrian G. Barnett. 2018. An
Introduction to Generalized
Linear Models. 4th ed. New York: Chapman;
Hall/CRC. https://doi.org/10.1201/9781315182780.
Dunn, Peter K., and Gordon K. Smyth. 2018. Generalized
Linear Models With
Examples in R. Springer.
Dunn, Robin, Larry Wasserman, and Aaditya Ramdas. 2020.
Distribution-Free Prediction
Sets with Random Effects.
Efron, Bradley, and R. J. Tibshirani. 1994. An
Introduction to the Bootstrap. New York:
Chapman; Hall/CRC. https://doi.org/10.1201/9780429246593.
Face, Hugging. 2024. “Byte-Pair Encoding
Tokenization - Hugging Face NLP
Course.” https://huggingface.co/learn/nlp-course/en/chapter6/5.
Facure Alves, Matheus. 2022. “Causal Inference for
The Brave and True —
Causal Inference for the Brave
and True.” https://matheusfacure.github.io/python-causality-handbook/landing-page.html.
Fahrmeir, Ludwig, Thomas Kneib, Stefan Lang, and Brian D. Marx. 2021.
Regression: Models, Methods and
Applications. Berlin, Heidelberg: Springer. https://doi.org/10.1007/978-3-662-63882-8.
Faraway, Julian. 2014. “Linear Models with
R.” Routledge & CRC Press. https://www.routledge.com/Linear-Models-with-R/Faraway/p/book/9781439887332.
Faraway, Julian J. 2016. Extending the Linear
Model with R: Generalized
Linear, Mixed Effects and
Nonparametric Regression Models,
Second Edition. 2nd ed. New York:
Chapman; Hall/CRC. https://doi.org/10.1201/9781315382722.
Ferrari, Silvia, and Francisco Cribari-Neto. 2004. “Beta
Regression for Modelling Rates
and Proportions.” Journal of Applied
Statistics 31 (7): 799–815. https://doi.org/10.1080/0266476042000214501.
Fortuner, Brendan. 2023. “Machine Learning
Glossary.” https://ml-cheatsheet.readthedocs.io/en/latest/index.html.
Fox, John. 2015. Applied Regression
Analysis and Generalized Linear
Models. SAGE Publications.
Gelman, Andrew. 2013. “What Are the Key Assumptions of Linear
Regression? Statistical
Modeling, Causal Inference, and
Social Science.” https://statmodeling.stat.columbia.edu/2013/08/04/19470/.
Gelman, Andrew, John B. Carlin, Hal S. Stern, David B. Dunson, Aki
Vehtari, and Donald B. Rubin. 2013. Bayesian Data
Analysis, Third Edition. CRC
Press.
Gelman, Andrew, and Jennifer Hill. 2006. Data Analysis Using
Regression and Multilevel/Hierarchical Models. Cambridge university
press.
Gelman, Andrew, Jennifer Hill, Ben Goodrich, Jonah Gabry, Daniel
Simpson, and Aki Vehtari. 2024. “Advanced Regression
and Multilevel Models.” http://www.stat.columbia.edu/~gelman/armm/.
Gelman, Andrew, Jennifer Hill, and Aki Vehtari. 2020. Regression and
Other Stories. 1st ed. Cambridge
University Press. https://doi.org/10.1017/9781139161879.
Gelman, Andrew, and Eric Loken. 2013. “The Garden of Forking
Paths: Why Multiple Comparisons Can Be a Problem, Even When
There Is No ‘FIshing Expedition’ or
‘p-Hacking’ and the Research Hypothesis Was Posited Ahead
of Time.”
Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep
Learning. https://www.deeplearningbook.org/.
Google. 2023a. “Imbalanced Data
Machine Learning.” Google for
Developers. https://developers.google.com/machine-learning/data-prep/construct/sampling-splitting/imbalanced-data.
———. 2023b. “Introduction Machine
Learning.” Google for Developers. https://developers.google.com/machine-learning/decision-forests.
———. 2023c. “Machine Learning
Google for Developers.” https://developers.google.com/machine-learning.
———. 2024a. “Classification: ROC Curve
and AUC Machine
Learning.” Google for Developers. https://developers.google.com/machine-learning/crash-course/classification/roc-and-auc.
———. 2024b. “Reducing Loss: Gradient
Descent Machine
Learning.” Google for Developers. https://developers.google.com/machine-learning/crash-course/reducing-loss/gradient-descent.
———. 2024c. “What Is Unsupervised Learning?” https://cloud.google.com/discover/what-is-unsupervised-learning.
Greene, William. 2017. Econometric Analysis - 8th
Edition. https://pages.stern.nyu.edu/~wgreene/Text/econometricanalysis.htm.
Hardin, James W., and Joseph M. Hilbe. 2018. Generalized
Linear Models and
Extensions. Stata Press.
Harrell, Frank E. 2015. Regression Modeling
Strategies: With Applications to
Linear Models, Logistic and
Ordinal Regression, and Survival
Analysis. 2nd ed. Springer Series in
Statistics. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-19425-7.
Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2017.
Elements of Statistical Learning: Data
Mining, Inference, and Prediction. 2nd Edition. https://hastie.su.domains/ElemStatLearn/.
Heiss, Andrew. 2022. “Marginalia: A Guide to Figuring
Out What the Heck Marginal Effects, Marginal Slopes, Average Marginal
Effects, Marginal Effects at the Mean, and All These Other Marginal
Things Are.” Andrew Heiss. https://www.andrewheiss.com/blog/2022/05/20/marginalia/#what-are-marginal-effects.
Hernán, Miguel A. 2018. “The C-Word:
Scientific Euphemisms Do
Not Improve Causal
Inference From Observational
Data.” American Journal of Public Health
108 (5): 616–19. https://doi.org/10.2105/AJPH.2018.304337.
Hernán, Miguel A., and James M. Robins. 2012. Causal
Inference: What If (the
Book). https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/.
Howard, Jeremy. 2024. “Practical Deep
Learning for Coders - Practical
Deep Learning.” Practical Deep
Learning for Coders. https://course.fast.ai/.
Hvitfeldt, Emil. 2024. “Feature Engineering
A-Z
Preface.” Feature Engineering A-Z. https://feaz-book.com/.
Hyndman, Rob, and George Athanasopoulos. 2021. Forecasting:
Principles and Practice (3rd Ed). https://otexts.com/fpp3/.
Ivanova, Anna A, Shashank Srikant, Yotaro Sueoka, Hope H Kean, Riva
Dhamala, Una-May O’Reilly, Marina U Bers, and Evelina Fedorenko. 2020.
“Comprehension of Computer Code Relies Primarily on Domain-General
Executive Brain Regions.” Edited by Andrea E Martin, Timothy E
Behrens, William Matchin, and Ina Bornkessel-Schlesewsky. eLife
9 (December): e58906. https://doi.org/10.7554/eLife.58906.
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani.
2021. An Introduction to Statistical
Learning. Vol. 103. Springer Texts in
Statistics. New York, NY: Springer New York. https://doi.org/10.1007/978-1-4614-7138-7.
Jiang, Lili. 2020. “A Visual Explanation
of Gradient Descent Methods
(Momentum, AdaGrad, RMSProp,
Adam).” Medium. https://towardsdatascience.com/a-visual-explanation-of-gradient-descent-methods-momentum-adagrad-rmsprop-adam-f898b102325c.
Jordan, Jeremy. 2018. “Introduction to Autoencoders.”
Jeremy Jordan. https://www.jeremyjordan.me/autoencoders/.
Kirillov, Alexander, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe
Rolland, Laura Gustafson, Tete Xiao, et al. 2023. “Segment
Anything.” In, 4015–26. https://openaccess.thecvf.com/content/ICCV2023/html/Kirillov_Segment_Anything_ICCV_2023_paper.html.
Koenker, Roger. 2000. “Galton, Edgeworth,
Frisch, and Prospects for Quantile Regression in
Econometrics.” Journal of Econometrics 95 (2): 347–74.
https://doi.org/10.1016/S0304-4076(99)00043-3.
———. 2005. Quantile Regression. Vol. 38. Cambridge university
press. https://books.google.com/books?hl=en&lr=&id=WjOdAgAAQBAJ&oi=fnd&pg=PT12&dq=koenker+quantile+regression&ots=CQFHSt5o-W&sig=G1TpKPHo-BRdJ8qWcBrIBI2FQAs.
Kruschke, John. 2010. Doing Bayesian Data
Analysis: A Tutorial
Introduction with R. Academic Press.
Künzel, Sören R., Jasjeet S. Sekhon, Peter J. Bickel, and Bin Yu. 2019.
“Metalearners for Estimating Heterogeneous Treatment Effects Using
Machine Learning.” Proceedings of the National Academy of
Sciences 116 (10): 4156–65. https://doi.org/10.1073/pnas.1804597116.
Lang, Michel, Martin Binder, Jakob Richter, Patrick Schratz, Florian
Pfisterer, Stefan Coors, Quay Au, Giuseppe Casalicchio, Lars Kotthoff,
and Bernd Bischl. 2019. “Mlr3: A Modern
Object-Oriented Machine Learning Framework in R.”
Journal of Open Source Software 4 (44): 1903. https://doi.org/10.21105/joss.01903.
Lee, Jaehoon, Yasaman Bahri, Roman Novak, Samuel S. Schoenholz, Jeffrey
Pennington, and Jascha Sohl-Dickstein. 2017. “Deep
Neural Networks as Gaussian
Processes.” arXiv.org. https://arxiv.org/abs/1711.00165v3.
Leech, Gavin, Juan J. Vazquez, Misha Yagudin, Niclas Kupper, and
Laurence Aitchison. 2024. “Questionable Practices in Machine
Learning.” arXiv. https://doi.org/10.48550/arXiv.2407.12220.
Lones, Michael A. 2024. “How to Avoid Machine Learning Pitfalls: A
Guide for Academic Researchers.” arXiv. https://doi.org/10.48550/arXiv.2108.02497.
Mahr, Tristan. 2021. “Random Effects and Penalized Splines Are the
Same Thing.” Higher Order Functions. https://tjmahr.github.io/random-effects-penalized-splines-same-thing/.
Masis, Serg. 2023. “Interpretable Machine
Learning with Python - Second
Edition.” Packt. https://www.packtpub.com/product/interpretable-machine-learning-with-python-second-edition/9781803235424.
McCullagh, P. 2019. Generalized Linear
Models. 2nd ed. New York: Routledge. https://doi.org/10.1201/9780203753736.
McCulloch, Warren S., and Walter Pitts. 1943. “A Logical Calculus
of the Ideas Immanent in Nervous Activity.” The Bulletin of
Mathematical Biophysics 5 (4): 115–33. https://doi.org/10.1007/BF02478259.
McElreath, Richard. 2020. “Statistical Rethinking:
A Bayesian Course with
Examples in R and STAN.”
Routledge & CRC Press. https://www.routledge.com/Statistical-Rethinking-A-Bayesian-Course-with-Examples-in-R-and-STAN/McElreath/p/book/9780367139919.
Microsoft. 2024. “Generative AI for
Beginners.” https://microsoft.github.io/generative-ai-for-beginners/#/.
Molnar, Christoph. 2023. Interpretable Machine
Learning. https://christophm.github.io/interpretable-ml-book/.
Monroe, Elizabeth, and Michael Clark. 2024. “Imbalanced
Outcomes: Challenges and
Solutions.”
Morgan, Stephen, and Christopher Winship. 2014. “Counterfactuals
and Causal Inference: Methods and
Principles for Social Research,
2nd Edition,” January. https://stars.library.ucf.edu/etextbooks/298.
Murphy, Kevin P. 2012. “Machine Learning:
A Probabilistic
Perspective.” MIT Press. https://mitpress.mit.edu/9780262018029/machine-learning/.
———. 2023. “Probabilistic Machine
Learning.” MIT Press. https://mitpress.mit.edu/9780262046824/probabilistic-machine-learning/.
Neal, Radford M. 1996. “Priors for Infinite
Networks.” In Bayesian Learning for
Neural Networks, edited by Radford M.
Neal, 29–53. New York, NY: Springer. https://doi.org/10.1007/978-1-4612-0745-0_2.
Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized
Linear Models.” Royal Statistical
Society. Journal. Series A: General 135 (3): 370–84. https://doi.org/10.2307/2344614.
Niculescu-Mizil, Alexandru, and Rich Caruana. 2005. “Predicting
Good Probabilities with Supervised Learning.” In Proceedings
of the 22nd International Conference on Machine Learning -
ICML ’05, 625–32. Bonn, Germany: ACM Press. https://doi.org/10.1145/1102351.1102430.
Pearl, Judea. 2009. “Causal Inference in Statistics:
An Overview.” Statistics Surveys 3 (none):
96–146. https://doi.org/10.1214/09-SS057.
———. 2022. “Causal Inference: History,
Perspectives, Adventures, and
Unification (An Interview with
Judea Pearl).” https://muse.jhu.edu/pub/56/article/867087/summary.
Penn State, Department of Statistics. 2018. “5.4 - A
Matrix Formulation of the
Multiple Regression Model
STAT 462.” https://online.stat.psu.edu/stat462/node/132/.
Pochinkov, Nicky. 2023. “LLM Basics:
Embedding Spaces - Transformer
Token Vectors Are
Not Points in Space,”
February. https://www.lesswrong.com/posts/pHPmMGEMYefk9jLeh/llm-basics-embedding-spaces-transformer-token-vectors-are.
Pok, Wilson. 2020. “How Uplift Modeling Works
Blogs.” https://ambiata.com/blog/2020-07-07-uplift-modeling/.
Power, Alethea, Yuri Burda, Harri Edwards, Igor Babuschkin, and Vedant
Misra. 2022. “Grokking: Generalization
Beyond Overfitting on Small
Algorithmic Datasets.” arXiv. https://doi.org/10.48550/arXiv.2201.02177.
Prince, Simon J. D. 2023. Understanding Deep
Learning. MIT Press.
Quantmetry. 2024. “MAPIE - Model
Agnostic Prediction Interval
Estimator — MAPIE 0.8.2 Documentation.”
https://mapie.readthedocs.io/en/latest/.
Raschka, Sebastian. 2014. “About Feature
Scaling and Normalization.”
Sebastian Raschka, PhD. https://sebastianraschka.com/Articles/2014_about_feature_scaling.html.
———. 2022. Machine Learning with PyTorch
and Scikit-Learn. https://sebastianraschka.com/books/machine-learning-with-pytorch-and-scikit-learn/.
———. 2023a. Build a Large Language
Model (From Scratch). https://www.manning.com/books/build-a-large-language-model-from-scratch.
Rasmussen, Carl Edward, and Christopher K. I. Williams. 2005.
Gaussian Processes for Machine
Learning. The MIT Press. https://doi.org/10.7551/mitpress/3206.001.0001.
Ripley, Brian D. 1996. Pattern Recognition and
Neural Networks. Cambridge: Cambridge
University Press. https://doi.org/10.1017/CBO9780511812651.
Roback, Paul, and Julie Legler. 2021. Beyond Multiple
Linear Regression. https://bookdown.org/roback/bookdown-BeyondMLR/.
Roberts, Eric. 2000. “Neural Networks -
History.” https://cs.stanford.edu/people/eroberts/courses/soco/projects/neural-networks/History/history1.html.
Rocca, Baptiste. 2019. “Handling Imbalanced Datasets in Machine
Learning.” Medium. https://towardsdatascience.com/handling-imbalanced-datasets-in-machine-learning-7a0e84220f28.
Rovine, Michael J, and Douglas R Anderson. 2004. “Peirce and
Bowditch.” The American Statistician 58
(3): 232–36. https://doi.org/10.1198/000313004X964.
Schmidhuber, Juergen. 2022. “Annotated History of
Modern AI and Deep
Learning.” arXiv. https://doi.org/10.48550/arXiv.2212.11279.
scikit-learn. 2023a. “1.16. Probability
Calibration.” Scikit-Learn. https://scikit-learn/stable/modules/calibration.html.
———. 2023b. “Nested Versus Non-Nested Cross-Validation.”
Scikit-Learn. https://scikit-learn/stable/auto_examples/model_selection/plot_nested_cross_validation_iris.html.
Sen, Rajat, and Yichen Zhou. 2024. “A Decoder-Only Foundation
Model for Time-Series Forecasting.” http://research.google/blog/a-decoder-only-foundation-model-for-time-series-forecasting/.
Shalizi, Cosma. 2015. “F-Tests, R2, and
Other Distractions.” https://www.stat.cmu.edu/~cshalizi/mreg/15/.
Shevchenko, Maksim. 2023. “Types of Customers — Scikit-Uplift
0.3.1 Documentation.” https://www.uplift-modeling.com/en/v0.5.1/user_guide/introduction/clients.html.
Shorten, Connor, Taghi M. Khoshgoftaar, and Borko Furht. 2021.
“Text Data Augmentation for
Deep Learning.” Journal of Big
Data 8 (1): 101. https://doi.org/10.1186/s40537-021-00492-0.
Simpson, Gavin. 2021. “Using Random Effects in GAMs
with Mgcv.” From the Bottom of the Heap, February. https://www.fromthebottomoftheheap.net/2021/02/02/random-effects-in-gams/.
StackExchange. 2015. “Are There Any Differences Between Tensors
and Multidimensional Arrays?” Forum post. Mathematics Stack
Exchange. https://math.stackexchange.com/q/1134809.
StatQuest with Josh Starmer. 2019a. “Gradient
Descent, Step-by-Step.” https://www.youtube.com/watch?v=sDv4f4s2SB8.
———. 2019b. “Stochastic Gradient
Descent, Clearly
Explained!!!” https://www.youtube.com/watch?v=vMh0zPT0tLI.
———. 2021. “Bootstrapping Main
Ideas!!!” https://www.youtube.com/watch?v=Xz0x-8-cgaQ.
Turrell, Arthur, Pietro Monticone, Zeki Akyol, and Yiben Huang. 2024.
“Python for Data Science V1.0.1,”
January. https://doi.org/10.5281/ZENODO.10518241.
UCLA Advanced Research Computing. 2023a. “FAQ:
How Do I Interpret Odds Ratios in Logistic
Regression?” https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-how-do-i-interpret-odds-ratios-in-logistic-regression/.
———. 2023b. “FAQ: What Are Pseudo
R-Squareds?” https://stats.oarc.ucla.edu/other/mult-pkg/faq/general/faq-what-are-pseudo-r-squareds/.
Ushey, Kevin, JJ Allaire, and Tang Yuan. 2024. “Arrays in
R and Python.” https://cran.r-project.org/web/packages/reticulate/vignettes/arrays.html.
VanderPlas, Jake. 2016. “Python Data
Science Handbook [Book].”
https://www.oreilly.com/library/view/python-data-science/9781491912126/.
Vaswani, Ashish, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion
Jones, Aidan N Gomez, Ł ukasz Kaiser, and Illia Polosukhin. 2017.
“Attention Is All You Need.” In
Advances in Neural Information
Processing Systems. Vol. 30. Curran
Associates, Inc. https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html.
Vig, Jesse. 2019. “Deconstructing BERT,
Part 2: Visualizing the Inner
Workings of Attention.”
Medium. https://towardsdatascience.com/deconstructing-bert-part-2-visualizing-the-inner-workings-of-attention-60a16d86b5c1.
Weed, Ethan, and Danielle Navarro. 2021. Learning
Statistics with Python — Learning
Statistics with Python. https://ethanweed.github.io/pythonbook/landingpage.html.
Welchowski, Thomas, Kelly O. Maloney, Richard Mitchell, and Matthias
Schmid. 2022. “Techniques to Improve
Ecological Interpretability of
Black-Box Machine
Learning Models.” Journal of
Agricultural, Biological and Environmental Statistics 27 (1):
175–97. https://doi.org/10.1007/s13253-021-00479-7.
Wikipedia. 2023. “Relationships Among Probability
Distributions.” Wikipedia. https://en.wikipedia.org/wiki/Relationships_among_probability_distributions.
———. 2024a. “Exponential Family.” Wikipedia. https://en.wikipedia.org/w/index.php?title=Exponential_family&oldid=1202463189.
———. 2024b. “Gradient.” Wikipedia. https://en.wikipedia.org/w/index.php?title=Gradient&oldid=1206147282.
———. 2024c. “Cross-Entropy.” Wikipedia. https://en.wikipedia.org/w/index.php?title=Cross-entropy&oldid=1221840853#Cross-entropy_loss_function_and_logistic_regression.
———. 2024d. “Replication Crisis.” Wikipedia. https://en.wikipedia.org/w/index.php?title=Replication_crisis&oldid=1222335234.
Witten, Daniela. 2020. “The
Bias-Variance
Trade-Off & "DOUBLE
DESCENT".” X (Formerly Twitter). https://x.com/daniela_witten/status/1292293102103748609.
Wood, Simon N. 2017. Generalized Additive
Models: An Introduction with
R, Second Edition. 2nd ed.
Boca Raton: Chapman; Hall/CRC. https://doi.org/10.1201/9781315370279.
Wooldridge, Jeffrey M. 2012. Introductory Econometrics:
A Modern Approach. 5th
edition. Mason, OH: Cengage Learning.
Yeh, Tom. 2024. “AI by Hand ✍️
Tom Yeh
Substack.” https://aibyhand.substack.com/.
Zhang, Aston, Zack Lipton, Mu Li, and Alex Smola. 2023. “Dive into
Deep Learning — Dive into
Deep Learning 1.0.3 Documentation.” https://d2l.ai/index.html.