Appendix B — More Models
This is a list of models that includes those we’ve seen, some we’ve just mentioned, and some we’ve not explored at all.
B.1 Linear Models
Many of these were listed in Section 3.8, but we provide a few more to think about here.
Simplified Linear Models
- correlation
- t-test and ANOVA
- chi-square
Generalized Linear Models and related
- True GLM e.g. gamma
- Other distributions: beta regression, tweedie, t (so-called ‘robust’), truncated
- Censored outcomes: Survival models, tobit
- Nonlinear regression
- Modeling other parameters (e.g. heteroscedastic models)
- Modeling beyond the mean (quantile regression)
- Mixed models
- Generalized Additive Models
- Media mixed models
Other Random Effects
- Gaussian process regression
- Spatial models (CAR, SAR, etc.)
- Time series models (ARIMA and related, e.g. state space)
- Factor analysis
Multivariate/multiclass/multipart
- Multivariate regression (multiple targets)
- Multinomial/Categorical/Ordinal regression (>2 classes)
- MANOVA/Linear Discriminant Analysis (these are identical, and can handle multiple outputs or >=2 classes)
- Zero (or some number) -inflated/hurdle/altered
- Mixture models and Cluster analysis (e.g. K-means, Hierarchical clustering)
- Two-stage least squares, instrumental variables
- SEM, simultaneous equations
- PCA, Factor Analysis
- Mixture models
- Structural Equation Modeling, Graphical models generally
All of these are explicitly linear models or can be framed as such, and compared to what you’ve already seen, typically only require a tweak or two from those - e.g. a different distribution, a different link function, penalizing the coefficients, etc. In other cases, we can bounce from one to another. For example we can reshape our multivariate regression to be amenable to a mixed model approach, and get the exact same results. We can potentially add a random effect to any model, and that random effect can be based on time, spatial or other considerations. The important thing to know is that the linear model is a very flexible tool that expands easily, and allows you to model most of the types of outcomes we are interested in. As such, it’s a very powerful approach to modeling.
B.2 Other Machine Learning Models
As we’ve mentioned, machine learning is more of a modeling approach, and any model could be used for machine learning. That said, there are some models you’ll typically only see in a machine learning context, as they often do not provide specific parameters of interest like coefficients or variance components, have easy interpretability, nor have straightforward ways to estimate uncertainty. While the focus of these models was more on prediction, many of these in this list are no longer performant compared to tools used today. Still, they can be interesting historically or conceptually, may still be employed in some domains, some are special cases of more widely used techniques, and some can still be used as baseline models.
Standard Regression/Classification
- Linear Discriminant Analysis
- k-Nearest neighbors regression
- Naive Bayes
- Support Vector Machines, Boltzmann Machines
- Projection pursuit regression, MARS
- (Hidden) Markov Models
- Undirected graphs, Markov Random Fields, Network analysis
- Single Decision trees, CART, C4.5, etc.
Ensemble Models
- Stacking
- Bayesian Model Averaging
Unsupervised Techniques
Latent Models
- PCA, probabilistic PCA, ICA, SVD
- Latent Dirichlet Allocation (LDA), Latent Semantic Analysis (LSA)
- (Non-negative) Matrix Factorization (NMF)
- Dirichlet process
Clustering
- t-SNE
- (H)DBSCAN
- Spectral clustering
- Self-organizing maps
B.3 Other Deep Learning Models
We haven’t delved into the world of deep learning as much as there hasn’t yet been a ‘foundational’ model for tabular data of the sort we’ve focused on. However most of the models that make headlines today are built upon simpler models, even going back to the basic multilayer perceptron which we did introduce. Here are some of the models you might see in the wild:
- Convolutional Neural Networks
- Recurrent Neural Networks
- Long Short-Term Memory Networks
- Transformers/Attention Mechanisms
- Autoencoders
- Generative Adversarial Networks
- Graph Neural Networks
- Reinforcement Learning
Convolutional neural networks as currently implemented can be seen going back to LeNet in the late 1990s, and took off several years later with AlexNet and VGG. ResNet (residual networks) and Densenet are relatively more recent examples of CNNs, though even they have been around for several years at this point. Even so, several of these still serve as baseline models for image classification and object detection, either in practice or as a reference point for current model performance.
NLP and language processing more generally can be seen as evolving from matrix factorization and LDA, to neural network models such as word2vec and GloVe. In addition, the temporal nature of text suggested time-based models, including even more statistical models like hidden markov models back in the day. But in the neural network domain, we have standard Recurrent networks, then LSTMs, GRUs, Seq2Seq, and more that continued the theme. Now the field is dominated by attention-based transformers, of which BERT variants were popular early on, and OpenAI’s GPT is among the most famous example of modern larger language models, but there are many others that have been developed in the last few years, offered from Meta, Google, Anthropic and others.
Michael has surveyed some of the developments in deep learning for tabular data (Clark (2022)), and though he hasn’t seen anything as of this writing to change the general conclusion, he hopes to revisit the topic in earnest again in the future, so stay tuned.