Here you’ll find documents of varying technical degree covering things of interest to me or which I think will be interesting to those I engage with. Most are demonstration of statistical concepts or programming, and may be geared towards beginners or more advanced. I group them based on whether they are more focused on statistical concepts, programming, or miscellaneous.

Model
Estimation by Example

This shows ‘by-hand’ code for various
models and estimation approaches, from linear
regression to Bayesian multilevel
mediation models, and demonstrations from penalized maximum likelihood to stochastic gradient descent.

Mixed Models with
R

This document focuses on mixed effects models
using R, covering basic random
effects models (random intercepts and
slopes) as well as extensions into generalized mixed models and discussion of
realms beyond.

Bayesian
Basics

This serves as a conceptual introduction to
Bayesian modeling with examples using
R and Stan.

Generalized Additive Models

An introduction to generalized additive models with an emphasis
on generalization from familiar linear models and using the mgcv package in R.

Practical Data Science

Focus is on common data science tools
and techniques in R, including data processing, programming, modeling,
visualization, and presentation of results. Exercises may be found in
the document, and demonstrations of most content in Python is available
via Jupyter
notebooks.

Structural Equation
Modeling

This document (and related workshop)
focuses on structural equation
modeling. It is conceptually based, and tries to generalize
beyond the standard SEM treatment. Topics include: graphical models (directed and undirected, including path analysis, bayesian networks, and network analysis), mediation, moderation, latent
variable models (including principal
components analysis and ‘factor
analysis’), measurement models,
structural equation models, mixture models, growth curves, IRT, collaborative
filtering/recommender systems,
hidden Markov models, multi-group models etc.

Introduction to Machine
Learning

A gentle introduction to machine learning concepts with some
application in R. It covers topics such
as loss functions, cross-validation, regularization, and bias-variance trade-off, techniques such as
penalized regression, random forests, and neural nets, and more.

- The Double Descent Phenomenon
- Deep Learning for Tabular Data
- Practical Bayesian Analysis (I, II)
- Micro-macro Models
- Predictions with an Offset
- Factor Analysis and Related Methods
- Convergence Problems in Mixed Models
- Categorical Random Effects
- Mixed Models for Big Data
- Fractional Regression
- Group Comparisons in
SEM

- Empirical Bayes
- Shrinkage in Mixed
Models

- Mediation Models

Model
Estimation by Example

This shows ‘by-hand’ code for various
models and estimation approaches, from linear
regression to Bayesian multilevel
mediation models, and demonstrations from penalized maximum likelihood to stochastic gradient descent.

Data Modeling in
R

This document demonstrates a wide array of
statistical and other models in R.
Generic code is provided for standard regression, mixed, additive, survival, and latent variable models, principal components, factor analysis, SEM, cluster
analysis, time series, spatial models, zero-altered models, text analysis, Bayesian analysis, machine learning and more.

The document is designed for newcomers to R, whether in a statistical sense, or just a programming one. It also should appeal to those working in other packages who are curious how to do the same sorts of things in R.

Bayesian
Basics

This serves as a conceptual introduction to
Bayesian modeling with examples using
R and Stan.

MCMC
algorithms

List of MCMC algorithms with brief
descriptions.

Bayesian Demonstration

A simple interactive demonstration for
those just starting on their Bayesian
journey.

Mixed Models with
R

This workshop focuses on mixed effects models
using R, covering basic random
effects models (random intercepts and
slopes) as well as extensions into generalized mixed models and discussion of
realms beyond.

Mixed Models Overview

An overview that introduces mixed models for those with varying
technical/statistical backgrounds.

Mixed Models Introduction

A non-technical document to introduce mixed models for those who have used ANOVA.

Clustered Data
Situations

A comparison of standard models, cluster robust standard errors, fixed effect models, mixed models (random effects models), generalized estimating equations (GEE), and
latent growth curve models for dealing
with clustered data (e.g. longitudinal,
hierarchical etc.).

Mixed Model Estimation

Demonstration of mixed models via maximum likelihood and link to additive models.

Mixed
and Growth Curve Models

A comparison of the mixed model vs. latent variable approach for longitudinal data (growth curve models), with simulation of
performance in situations of small sample sizes.

Structural Equation
Modeling

This document (and related workshop)
focuses on structural equation
modeling. It is conceptually based, and tries to generalize
beyond the standard SEM treatment. The initial workshop was given to an
audience of varying background and statistical skill, but the document
should be useful to anyone interested in the techniques covered. It is
completely R-based, with special emphasis on the lavaan package. It will continue to be a
work in progress, particularly the sections after the SEM chapter. Topics include: graphical models (directed and undirected, including path analysis, bayesian networks, and network analysis), mediation, moderation, latent
variable models (including principal
components analysis and ‘factor
analysis’), measurement models,
structural equation models, mixture models, growth curves. Topics I hope to provide
overviews of in the future include other latent variable
techniques/extensions such as IRT,
collaborative filtering/recommender systems, hidden Markov models, multi-group models etc.

Factor
Analysis and Related Methods

This document gives a brief overview of
many matrix factorization, dimension reduction, and latent variable techniques. Here is a list:

Principal Components Analysis - Factor Analysis - Probabilistic Components Analysis - Non-negative Matrix Factorization - Latent Dirichlet Allocation - Structural Equation Modeling - Item Response Theory - Independent Components Analysis - Multidimensional Scaling - t-Distributed Stochastic Neighbor Embedding (t-sne) - Recommender Systems - Hidden Markov Models - Random Effects Models - Bayesian Approaches - Mixture Models - k-means Cluster Analysis - Hierarchical Cluster Analysis - Latent Class Analysis

Latent
Variables, Sum Scores,
Single Items

It is very common to use sum scores of
several variables as a single entity to be used in subsequent analysis
(e.g. a regression model). Some may even more use a single variable even
though multiple indicators are available. Assuming the multiple measures
indicate a latent construct, such typical practice would be problematic
relative to using estimated factor
scores, either constructed as part of a two-stage process or as
part of a structural equation
model. This document covers simulations in which comparisons in
performance are made between latent variable and sum score or single
item approaches.

Lord’s Paradox

Summary of Pearl’s 2014 and 2013 technical reports on
some modeling situations such as Lord’s
Paradox and Simpson’s Paradox that lead to surprising results
that are initially at odds with our intuition. Looks particularly at the
issue of change scores vs. controlling for baseline.

Generalized Additive Models

An introduction to generalized additive models with an emphasis
on generalization from familiar linear models and using the mgcv package in R.

Introduction to Machine
Learning

A gentle introduction to machine learning concepts with some
application in R.

Reliability

An unfinished document that ties together
some ideas regarding the statistical and conceptual notion of
reliability..

Fractional Regression

A quick primer regarding data between zero
and one, including zero and one.

Categorical Regression Models

An overview of regression models for binary, multinomial, and ordinal outcomes,
with connections among various types of models.

Topic Modeling Demo

A demonstration of Latent Dirichlet Allocation for topic modeling in R.

Comparing
Measures of Dependency

A summary of articles that look at various
measures of dependency Pearson’s r,
Spearman’s rho, and Hoeffding’s D, and newer ones such as Distance Correlation and Maximal Information Coefficient.

Check the workshops section also for programming-related content.

Practical Data Science (more details about this document below). The intention was to cover five key topics: basic information processing, programming, modeling, visualization, and publication/presentation.

Exploratory Data Analysis Tools An overview of various packages useful for quick exploration of data.

FastR

A notebook on how to make R faster before or irrespective of the
machinery used. Topics include avoiding
loops, vectorization, faster
I/O etc.

Engaging the Web with
R

Document regarding the use of R for web scraping, extracting data via an API, interactive web-based visualizations, and producing web-ready documents. It serves as an overview
of ways one might start to use R for web-based activities as opposed to
a hand-on approach.

R for Social
Science

This was put together in a couple of days
under duress, and is put here in case someone can find it useful (and
thus make the time spent on it not completely wasted).

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com//m-clark/m-clark.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".