Here you’ll find documents of varying technical degree covering things of interest to me or which I think will be interesting to those I engage with. Most are demonstration of statistical concepts or programming, and may be geared towards beginners or more advanced. I group them based on whether they are more focused on statistical concepts, programming, or miscellaneous.

Model Estimation by Example

This shows ‘by-hand’ code for various models and estimation approaches, from linear regression to Bayesian multilevel mediation models, and demonstrations from penalized maximum likelihood to stochastic gradient descent.

Bayesian Basics

This serves as a conceptual introduction to Bayesian modeling with examples using R and Stan.

Generalized Additive Models

An introduction to generalized additive models with an emphasis on generalization from familiar linear models and using the mgcv package in R.

Mixed Models with R

This document focuses on mixed effects models using R, covering basic random effects models (random intercepts and slopes) as well as extensions into generalized mixed models and discussion of realms beyond.

Practical Data Science

Focus is on common data science tools and techniques in R, including data processing, programming, modeling, visualization, and presentation of results. Exercises may be found in the document, and demonstrations of most content in Python is available via Jupyter notebooks.

Structural Equation Modeling

This document (and related workshop) focuses on structural equation modeling. It is conceptually based, and tries to generalize beyond the standard SEM treatment. Topics include: graphical models (directed and undirected, including path analysis, bayesian networks, and network analysis), mediation, moderation, latent variable models (including principal components analysis and ‘factor analysis’), measurement models, structural equation models, mixture models, growth curves, IRT, collaborative filtering/recommender systems, hidden Markov models, multi-group models etc.

Introduction to Machine Learning

A gentle introduction to machine learning concepts with some application in R. It covers topics such as loss functions, cross-validation, regularization, and bias-variance trade-off, techniques such as penalized regression, random forests, and neural nets, and more.

Data Modeling in R

This document demonstrates a wide array of statistical and other models in R. Generic code is provided for standard regression, mixed, additive, survival, and latent variable models, principal components, factor analysis, SEM, cluster analysis, time series, spatial models, zero-altered models, text analysis, Bayesian analysis, machine learning and more.

The document is designed for newcomers to R, whether in a statistical sense, or just a programming one. It also should appeal to those working in other packages who are curious how to do the same sorts of things in R.

Bayesian Basics

This serves as a conceptual introduction to Bayesian modeling with examples using R and Stan.

MCMC algorithms

List of MCMC algorithms with brief descriptions.

Bayesian Demonstration

A simple interactive demonstration for those just starting on their Bayesian journey.

Mixed Models with R

This workshop focuses on mixed effects models using R, covering basic random effects models (random intercepts and slopes) as well as extensions into generalized mixed models and discussion of realms beyond.

Mixed Models Overview

An overview that introduces mixed models for those with varying technical/statistical backgrounds.

Mixed Models Introduction

A non-technical document to introduce mixed models for those who have used ANOVA.

Clustered Data Situations

A comparison of standard models, cluster robust standard errors, fixed effect models, mixed models (random effects models), generalized estimating equations (GEE), and latent growth curve models for dealing with clustered data (e.g. longitudinal, hierarchical etc.).

Mixed Model Estimation

Demonstration of mixed models via maximum likelihood and link to additive models.

Mixed and Growth Curve Models

A comparison of the mixed model vs. latent variable approach for longitudinal data (growth curve models), with simulation of performance in situations of small sample sizes.

Structural Equation Modeling

This document (and related workshop) focuses on structural equation modeling. It is conceptually based, and tries to generalize beyond the standard SEM treatment. The initial workshop was given to an audience of varying background and statistical skill, but the document should be useful to anyone interested in the techniques covered. It is completely R-based, with special emphasis on the lavaan package. It will continue to be a work in progress, particularly the sections after the SEM chapter. Topics include: graphical models (directed and undirected, including path analysis, bayesian networks, and network analysis), mediation, moderation, latent variable models (including principal components analysis and ‘factor analysis’), measurement models, structural equation models, mixture models, growth curves. Topics I hope to provide overviews of in the future include other latent variable techniques/extensions such as IRT, collaborative filtering/recommender systems, hidden Markov models, multi-group models etc.

Factor Analysis and Related Methods

This document gives a brief overview of many matrix factorization, dimension reduction, and latent variable techniques. Here is a list:

Principal Components Analysis - Factor Analysis - Probabilistic Components Analysis - Non-negative Matrix Factorization - Latent Dirichlet Allocation - Structural Equation Modeling - Item Response Theory - Independent Components Analysis - Multidimensional Scaling - t-Distributed Stochastic Neighbor Embedding (t-sne) - Recommender Systems - Hidden Markov Models - Random Effects Models - Bayesian Approaches - Mixture Models - k-means Cluster Analysis - Hierarchical Cluster Analysis - Latent Class Analysis

Latent Variables, Sum Scores, Single Items

It is very common to use sum scores of several variables as a single entity to be used in subsequent analysis (e.g. a regression model). Some may even more use a single variable even though multiple indicators are available. Assuming the multiple measures indicate a latent construct, such typical practice would be problematic relative to using estimated factor scores, either constructed as part of a two-stage process or as part of a structural equation model. This document covers simulations in which comparisons in performance are made between latent variable and sum score or single item approaches.

Lord’s Paradox

Summary of Pearl’s 2014 and 2013 technical reports on some modeling situations such as Lord’s Paradox and Simpson’s Paradox that lead to surprising results that are initially at odds with our intuition. Looks particularly at the issue of change scores vs. controlling for baseline.

Generalized Additive Models

An introduction to generalized additive models with an emphasis on generalization from familiar linear models and using the mgcv package in R.

Introduction to Machine Learning

A gentle introduction to machine learning concepts with some application in R.

Fractional Regression

A quick primer regarding data between zero and one, including zero and one.

Categorical Regression Models

An overview of regression models for binary, multinomial, and ordinal outcomes, with connections among various types of models.

Topic Modeling Demo

A demonstration of Latent Dirichlet Allocation for topic modeling in R.

Comparing Measures of Dependency

A summary of articles that look at various measures of dependency Pearson’s r, Spearman’s rho, and Hoeffding’s D, and newer ones such as Distance Correlation and Maximal Information Coefficient.

Check the workshops section also for programming-related content.

Practical Data Science (more details about this document below). The intention was to cover five key topics: basic information processing, programming, modeling, visualization, and publication/presentation.

FastR

A notebook on how to make R faster before or irrespective of the machinery used. Topics include avoiding loops, vectorization, faster I/O etc.

Engaging the Web with R

Document regarding the use of R for web scraping, extracting data via an API, interactive web-based visualizations, and producing web-ready documents. It serves as an overview of ways one might start to use R for web-based activities as opposed to a hand-on approach.

R for Social Science

This was put together in a couple of days under duress, and is put here in case someone can find it useful (and thus make the time spent on it not completely wasted).

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com//m-clark/m-clark.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".