Documents

Here you’ll find documents of varying technical degree covering things of interest to me or which I think will be interesting to those I engage with. Most are demonstration of statistical concepts or programming, and may be geared towards beginners or more advanced. I group them based on whether they are more focused on statistical concepts, programming, or miscellaneous.

Mixed Models with R
This document focuses on mixed effects models using R, covering basic random effects models (random intercepts and slopes) as well as extensions into generalized mixed models and discussion of realms beyond.

Bayesian Basics
This serves as a conceptual introduction to Bayesian modeling with examples using R and Stan.

Model Estimation by Example
This shows ‘by-hand’ code for various models and estimation approaches, from linear regression to Bayesian multilevel mediation models, and demonstrations from penalized maximum likelihood to stochastic gradient descent.

Generalized Additive Models
An introduction to generalized additive models with an emphasis on generalization from familiar linear models and using the mgcv package in R.

Introduction to Machine Learning
A gentle introduction to machine learning concepts with some application in R. It covers topics such as loss functions, cross-validation, regularization, and bias-variance trade-off, techniques such as penalized regression, random forests, and neural nets, and more.

Practical Data Science
Focus is on common data science tools and techniques in R, including data processing, programming, modeling, visualization, and presentation of results. Exercises may be found in the document, and demonstrations of most content in Python is available via Jupyter notebooks.

Structural Equation Modeling
This document (and related workshop) focuses on structural equation modeling. It is conceptually based, and tries to generalize beyond the standard SEM treatment. Topics include: graphical models (directed and undirected, including path analysis, bayesian networks, and network analysis), mediation, moderation, latent variable models (including principal components analysis and ‘factor analysis’), measurement models, structural equation models, mixture models, growth curves, IRT, collaborative filtering/recommender systems, hidden Markov models, multi-group models etc.

Modeling/Programming Blog Posts

Statistical

Models By Example

Model Estimation by Example
This shows ‘by-hand’ code for various models and estimation approaches, from linear regression to Bayesian multilevel mediation models, and demonstrations from penalized maximum likelihood to stochastic gradient descent.

Modeling in R

Data Modeling in R
This document demonstrates a wide array of statistical and other models in R. Generic code is provided for standard regression, mixed, additive, survival, and latent variable models, principal components, factor analysis, SEM, cluster analysis, time series, spatial models, zero-altered models, text analysis, Bayesian analysis, machine learning and more.

The document is designed for newcomers to R, whether in a statistical sense, or just a programming one. It also should appeal to those working in other packages who are curious how to do the same sorts of things in R.

Bayesian

Bayesian Basics
This serves as a conceptual introduction to Bayesian modeling with examples using R and Stan.

MCMC algorithms
List of MCMC algorithms with brief descriptions.

Bayesian Demonstration
A simple interactive demonstration for those just starting on their Bayesian journey.

Mixed Models

Mixed Models with R
This workshop focuses on mixed effects models using R, covering basic random effects models (random intercepts and slopes) as well as extensions into generalized mixed models and discussion of realms beyond.

Mixed Models Overview
An overview that introduces mixed models for those with varying technical/statistical backgrounds.

Mixed Models Introduction
A non-technical document to introduce mixed models for those who have used ANOVA.

Clustered Data Situations
A comparison of standard models, cluster robust standard errors, fixed effect models, mixed models (random effects models), generalized estimating equations (GEE), and latent growth curve models for dealing with clustered data (e.g. longitudinal, hierarchical etc.).

Mixed Model Estimation
Demonstration of mixed models via maximum likelihood and link to additive models.

Mixed and Growth Curve Models
A comparison of the mixed model vs. latent variable approach for longitudinal data (growth curve models), with simulation of performance in situations of small sample sizes.

Latent Variables/SEM

Structural Equation Modeling
This document (and related workshop) focuses on structural equation modeling. It is conceptually based, and tries to generalize beyond the standard SEM treatment. The initial workshop was given to an audience of varying background and statistical skill, but the document should be useful to anyone interested in the techniques covered. It is completely R-based, with special emphasis on the lavaan package. It will continue to be a work in progress, particularly the sections after the SEM chapter. Topics include: graphical models (directed and undirected, including path analysis, bayesian networks, and network analysis), mediation, moderation, latent variable models (including principal components analysis and ‘factor analysis’), measurement models, structural equation models, mixture models, growth curves. Topics I hope to provide overviews of in the future include other latent variable techniques/extensions such as IRT, collaborative filtering/recommender systems, hidden Markov models, multi-group models etc.

Factor Analysis and Related Methods
This document gives a brief overview of many matrix factorization, dimension reduction, and latent variable techniques. Here is a list:

Principal Components Analysis - Factor Analysis - Probabilistic Components Analysis - Non-negative Matrix Factorization - Latent Dirichlet Allocation - Structural Equation Modeling - Item Response Theory - Independent Components Analysis - Multidimensional Scaling - t-Distributed Stochastic Neighbor Embedding (t-sne) - Recommender Systems - Hidden Markov Models - Random Effects Models - Bayesian Approaches - Mixture Models - k-means Cluster Analysis - Hierarchical Cluster Analysis - Latent Class Analysis

Latent Variables, Sum Scores, Single Items
It is very common to use sum scores of several variables as a single entity to be used in subsequent analysis (e.g. a regression model). Some may even more use a single variable even though multiple indicators are available. Assuming the multiple measures indicate a latent construct, such typical practice would be problematic relative to using estimated factor scores, either constructed as part of a two-stage process or as part of a structural equation model. This document covers simulations in which comparisons in performance are made between latent variable and sum score or single item approaches.

Lord’s Paradox
Summary of Pearl’s 2014 and 2013 technical reports on some modeling situations such as Lord’s Paradox and Simpson’s Paradox that lead to surprising results that are initially at odds with our intuition. Looks particularly at the issue of change scores vs. controlling for baseline.

Other Statistical

Generalized Additive Models
An introduction to generalized additive models with an emphasis on generalization from familiar linear models and using the mgcv package in R.

Introduction to Machine Learning
A gentle introduction to machine learning concepts with some application in R.

Reliability
An unfinished document that ties together some ideas regarding the statistical and conceptual notion of reliability..

Fractional Regression
A quick primer regarding data between zero and one, including zero and one.

Categorical Regression Models
An overview of regression models for binary, multinomial, and ordinal outcomes, with connections among various types of models.

Topic Modeling Demo
A demonstration of Latent Dirichlet Allocation for topic modeling in R.

Comparing Measures of Dependency
A summary of articles that look at various measures of dependency Pearson’s r, Spearman’s rho, and Hoeffding’s D, and newer ones such as Distance Correlation and Maximal Information Coefficient.

Programming

Check the workshops section also for programming-related content.

Practical Data Science (more details about this document below). The intention was to cover five key topics: basic information processing, programming, modeling, visualization, and publication/presentation.

Exploratory Data Analysis Tools An overview of various packages useful for quick exploration of data.

FastR
A notebook on how to make R faster before or irrespective of the machinery used. Topics include avoiding loops, vectorization, faster I/O etc.

Engaging the Web with R
Document regarding the use of R for web scraping, extracting data via an API, interactive web-based visualizations, and producing web-ready documents. It serves as an overview of ways one might start to use R for web-based activities as opposed to a hand-on approach.

Miscellaneous

R for Social Science
This was put together in a couple of days under duress, and is put here in case someone can find it useful (and thus make the time spent on it not completely wasted).

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com//m-clark/m-clark.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".