Here you’ll find documents of varying technical degree covering things of interest to me or which I think will be interesting to those I engage with. Most are demonstration of statistical concepts or programming, and may be geared towards beginners or more advanced. I group them based on whether they are more focused on statistical concepts, programming or tools, or miscellaneous.
Data Modeling in R
Data Modeling in R
This document demonstrates a wide array of statistical and other models in R. Generic code is provided for standard regression, mixed, additive, survival, and latent variable models, principal components, factor analysis, SEM, cluster analysis, time series, spatial models, zero-altered models, text analysis, Bayesian analysis, machine learning and more.
The document is designed for newcomers to R, whether in a statistical sense or just a programming one. It also should appeal to those working in other packages who are curious how to do the same sorts of things in R.
This serves as a conceptual introduction to Bayesian modeling with examples using R and Stan.
List of MCMC algorithms with brief descriptions.
A simple interactive demonstration for those just starting on their Bayesian journey.
Mixed Models Overview
An overview that introduces mixed models for those with varying technical/statistical backgrounds.
Mixed Models Introduction
A non-technical document to introduce mixed models for those who have used ANOVA.
Clustered Data Situations
A comparison of standard models, cluster robust standard errors, fixed effect models, mixed models (random effects models), generalized estimating equations (GEE), and latent growth curve models for dealing with clustered data (e.g. longitudinal, hierarchical etc.).
Mixed Model Estimation
Demonstration of mixed models via maximum likelihood and link to additive models.
Mixed and Growth Curve Models
A comparison of the mixed model vs. latent variable approach for longitudinal data (growth curve models), with simulation of performance in situations of small sample sizes.
Structural Equation Modeling
This document (and related workshop) focuses on structural equation modeling. It is conceptually based, and tries to generalize beyond the standard SEM treatment. The initial workshop was given to an audience of varying background and statistical skill, but the document should be useful to anyone interested in the techniques covered. It is completely R-based, with special emphasis on the lavaan package. It will continue to be a work in progress, particularly the sections after the SEM chapter. Topics include: graphical models (directed and undirected, including path analysis, bayesian networks, and network analysis), mediation, moderation, latent variable models (including principal components analysis and ‘factor analysis’), measurement models, structural equation models, mixture models, growth curves. Topics I hope to provide overviews of in the future include other latent variable techniques/extensions such as IRT, collaborative filtering/recommender systems, hidden Markov models, multi-group models etc.
Factor Analysis and Related Methods
This document gives a brief overview of many matrix factorization, dimension reduction, and latent variable techniques. Here is a list:
Latent Variables, Sum Scores, Single Items
It is very common to use sum scores of several variables as a single entity to be used in subsequent analysis (e.g. a regression model). Some may even more use a single variable even though multiple indicators are available. Assuming the multiple measures indicate a latent construct, such typical practice would be problematic relative to using estimated factor scores, either constructed as part of a two-stage process or as part of a structural equation model. This document covers simulations in which comparisons in performance are made between latent variable and sum score or single item approaches.
Summary of Pearl’s 2014 and 2013 technical reports on some modeling situations such as Lord’s Paradox and Simpson’s Paradox that lead to surprising results that are initially at odds with our intuition. Looks particularly at the issue of change scores vs. controlling for baseline.
Generalized Additive Models
An introduction to generalized additive models with an emphasis on generalization from familiar linear models and using the mgcv package in R.
Introduction to Machine Learning
A gentle introduction to machine learning concepts with some application in R.
Categorical Regression Models
An overview of regression models for binary, multinomial, and ordinal outcomes, with connections among various types of models.
Topic Modeling Demo
A demonstration of Latent Dirichlet Allocation for topic modeling in R.
Comparing Measures of Dependency
A summary of relatively recent articles that look at various measures of dependency Pearson’s r, Spearman’s rho, and Hoeffding’s D, and newer ones such as Distance Correlation and Maximal Information Coefficient.
Tools (esp. R)
Check the workshops section also.
A notebook on how to make R faster before or irrespective of the machinery used. Topics include avoiding loops, vectorization, faster I/O etc.
Engaging the Web with R
Document regarding the use of R for web scraping, extracting data via an API, interactive web-based visualizations, and producing web-ready documents. It serves as an overview of ways one might start to use R for web-based activities as opposed to a hand-on approach.
R for Social Science
This was put together in a couple of days under duress, and is put here in case someone can find it useful (and thus make the time spent on it not completely wasted).