Graphical Models

The document will start with the familiar, a standard regression model. It will then be presented as a graphical model, and extended to include indirect effects (e.g. \(\mathcal{A} \rightarrow \mathcal{B} \rightarrow \mathcal{C}\)) and multiple outcomes. At this point we will discuss directed graphs, and demonstrate a more theoretically motivated approach (sometimes called path analysis), and compare it to more flexible approaches that do not require prespecification of paths nor specific directional relations. We’ll briefly discuss undirected graphs, with an example utilizing ‘network analysis’.

Latent Variables

We will then discuss the notion of latent variables in the context of underlying causes or constructs and measurement error. We will begin by noting that latent variable models are actually even more broadly utilized when one includes other dimension reduction, or data compression, techniques, several of which fall under the heading of factor analysis. A few common techniques will be demonstrated such as ‘factor analysis’ and principal components analysis, and an overview will be provided for others. In addition, we will note other places one might find latent variable models.


Next we turn to structural equation modeling, where previously covered models come together under one modeling approach. We will spend a good deal of time with measurement models first, comparing them to our previous efforts, and then extend those models to the case of regression with latent variables. There are many issues to consider when developing such models, and an attempt will be made to cover quite a bit of ground in that regard.


The three sections just noted are the most developed, and serve as the basis for workshops. Other topics are touched upon for reference, and may also be added to in the future. For example latent growth curve models, are an alternative to a standard mixed model. Also, there are situations where the latent variable might be considered categorical, commonly called mixture models or cluster analysis, but in some specific contexts might go by other names (e.g. latent class analysis). Finally, an overview of other types of latent variable or structural equation models, such as item response theory, collaborative filtering etc. may also be provided.

Programming Language Choice

We will use R for a variety of reasons. One is that all of the techniques mentioned thus far are fully developed within various R packages, often taking just a line or two of code to implement after the data has been prepped. Furthermore, it is freely available and will work on Windows, Mac and Linux. R is well-known as a powerful statistical modeling environment, and its flexible programming language allows for efficient data manipulation and exploration. Furthermore, it has a vast array of visualization capabilities. In short, it provides everything one might need in a single environment for standard SEM, and nothing comes close in SEM-specific offerings.

Among alternatives, Mplus is the most fully developed structural equation modeling package, and has been for years. However, it is a poor tool for data management, few universities have a campus-wide license for it, and most of its functionality (and all we will need for this our purposes) is implemented within the lavaan family of R packages. Stata has relatively recently provided SEM capabilities, but it is less well-developed (something that might change in time), and it still requires a license, making non-campus usage difficult or costly. Other alternatives exist, but are not as commonly used as lavaan, Mplus, and Stata, at least in my experience across two universities and dozens of academic disciplines. For more on what’s out there, see the appendix.


For those wishing to follow along, for things to go smoothly you’ll need to complete the following steps precisely.

If you are not present or are bringing your laptop, you’ll need to have both R and RStudio installed on whatever machine you’ll be using. If this will be a new experience for you, install R first, then RStudio. For either you’ll need to choose the version appropriate to your operating system. As you go through the installation, for both just accept all defaults when prompted until the installation process begins. Once both are installed, you will only need to work with RStudio, and it will at all times be assumed you will be using RStudio during the workshop.

Once those are installed, proceed through the following steps.

  1. Download this zipfile, and unzip its contents to an area on your machine that you have write access to. It contains the course contents, data, etc. in a folder that will serve as an RStudio project folder.

  2. Open RStudio. File/Open Project/ then navigate to the folder contents you just unzipped. Click on the SEM file (should look like a blue icon, but otherwise is the SEM.Rproj file).

  3. Open the R script, install_script.R inside. Run the one line of code there and you’re set.

The lab for the workshop has Windows machines, and so the above is enough to proceed. For *nix systems, it should be mostly the same.