Here you’ll find documents of varying technical degree covering things of interest to me. Most are demonstration of statistical concepts or programming and may be geared towards beginners or more advanced. I group them based on whether they are more focused on statistical concepts, programming or tools, or miscellaneous.
This serves as a conceptual introduction to Bayesian modeling with examples using R and Stan.
List of MCMC algorithms with brief descriptions.
A simple interactive demonstration for those just starting on their Bayesian journey.
Mixed Models Overview
An overview that introduces mixed models for those with varying technical/statistical backgrounds.
Mixed Models Introduction
A non-technical document to introduce mixed models for those who have used ANOVA.
Clustered Data Situations
A comparison of standard models, cluster robust standard errors, fixed effect models, mixed models (random effects models), generalized estimating equations (GEE), and latent growth curve models for dealing with clustered data (e.g. longitudinal, hierarchical etc.).
Mixed Model Estimation
Demonstration of mixed models via maximum likelihood and link to additive models.
Mixed and Growth Curve Models
A comparison of the mixed model vs. latent variable approach for longitudinal data (growth curve models), with simulation of performance in situations of small sample sizes.
Structural Equation Modeling
This document (and related workshop) focuses on structural equation modeling. It is conceptually based, and tries to generalize beyond the standard SEM treatment. The initial workshop was given to an audience of varying background and statistical skill, but the document should be useful to anyone interested in the techniques covered. It is completely R-based, with special emphasis on the lavaan package. It will continue to be a work in progress, particularly the sections after the SEM chapter. Topics include: graphical models (directed and undirected, including path analysis, bayesian networks, and network analysis), mediation, moderation, latent variable models (including principal components analysis and ‘factor analysis’), measurement models, structural equation models, mixture models, growth curves. Topics I hope to provide overviews of in the future include other latent variable techniques/extensions such as IRT, collaborative filtering/recommender systems, hidden markov models, multi-group models etc.
Latent Variables, Sum Scores, Single Items
It is very common to use sum scores of several variables as a single entity to be used in subsequent analysis (e.g. a regression model). Some may even more use a single variable even though multiple indicators are available. Assuming the multiple measures indicate a latent construct, such typical practice would be problematic relative to using estimated factor scores, either constructed as part of a two stage process or as part of a structural equation model. This document covers simulations in which comparisons in performance are made between latent variable and sum score or single item approaches.
Summary of Pearl’s 2014 and 2013 technical reports on some modeling situations such as Lord’s Paradox and Simpson’s Paradox that lead to surprising results that are initially at odds with our intuition. Looks particularly at the issue of change scores vs. controlling for baseline.
Generalized Additive Models
An introduction to generalized additive models with an emphasis on generalization from familiar linear models and using the mgcv package in R. An older pdf version available here.
Introduction to Machine Learning
A gentle introduction to machine learning concepts with some application in R.
Categorical Regression Models
An overview of regression models for binary, multinomial, and ordinal outcomes, with connections among various types of models.
Topic Modeling Demo
A demonstration of Latent Dirichlet Allocation for topic modeling in R.
Comparing Measures of Dependency
A summary of relatively recent articles that look at various measures of dependency Pearson’s r, Spearman’s rho, and Hoeffding’s D, and newer ones such as Distance Correlation and Maximal Information Coefficient.
Tools (esp. R)
Check the workshops section also.
An in progress notebook on how to make R faster before or irrespective of the machinery used. Topics include avoiding loops, vectorization, faster I/O etc.
Engaging the Web with R
Document regarding the use of R for web scraping, extracting data via an API, interactive web-based visualizations, and producing web-ready documents. It serves as an overview of ways one might start to use R for web-based activities as opposed to a hand-on approach.
A History of Tornados
Because I had too much time on my hands and wanted to try out the dashboard feature of Rmarkdown. Maps tornado activity from 1950-2015. At some point I’ll go back and fix the lag issue.
Last Statements of the Texas Executed
A demonstration of both text analysis and literate programming/document generation with a dynamic and interactive research document. The texts regard the last statements of offenders in Texas.
R for Social Science
This was put together in a couple of days under duress, and is put here in case someone can find it useful (and thus make the time spent on it not completely wasted).