First we need to set up the data, and to do so we’ll consider the longitudinal setting in which individuals are observed over time2. The goal is to create a data set with several (four) observations per individual, i.e. the ‘repeated measures’ scenario. Aside from variables that represent individual and time point, I will also create an individual level variable, which means that it is constant for each individual. In this case it will be a simple binary variable indicating ‘treatment’ or ‘control’ as in an experiment.
The data will also have a certain correlation structure, an autoregressive structure. This means that the residuals of each time point will be notably correlated with the previous, but less so as the time between observations increases. If this is unfamiliar territory to you, just see it as an extra parameter (called rho in the following code) we’ll have to estimate and by which you can check the model results against the ‘true’ value.
Here are the main parameters of interest.
library(Matrix) set.seed(8675309) tp = 4 # number of timepoints n = 2500 # number of individuals sigmasq = 1 # residual variance rho = .7 # rho of AR1 model intercept = .2 # intercept time_beta = .5 # time effect treat_beta = -.5 # treatment effect intsd = .5 # intercept standard devation timesd = .25 # slope of time variable standard deviation
Now we can create the data. Here are the main components- time, individual id, and treatment. We will also create an individual specific effect for intercepts and the slope for time. These are our ‘random’ effects3.
Next we create the residual structure, made somewhat easier since the residual variance is set to 1. For those interested, this is an autoregressive structure of lag 1. This means that when time points are 1 measure apart, the correlation is \(\rho\), at two measures apart, \(\rho^2\), at three measures \(\rho^3\). You get the gist. This structure is the same within each individual. Once that is set we draw the residuals based on a multivariate normal with mean zero and covariance based on the structure we have provided.
10 x 10 sparse Matrix of class "dgTMatrix" [1,] 1.000 0.70 0.49 0.343 . . . . . . [2,] 0.700 1.00 0.70 0.490 . . . . . . [3,] 0.490 0.70 1.00 0.700 . . . . . . [4,] 0.343 0.49 0.70 1.000 . . . . . . [5,] . . . . 1.000 0.70 0.49 0.343 . . [6,] . . . . 0.700 1.00 0.70 0.490 . . [7,] . . . . 0.490 0.70 1.00 0.700 . . [8,] . . . . 0.343 0.49 0.70 1.000 . . [9,] . . . . . . . . 1.0 0.7 [10,] . . . . . . . . 0.7 1.0
Now we put it all together and create a data.frame object. A few entries are shown.
Now we are ready to proceed.
This will allow us to examine another technique later, growth curve models, that would not apply to the non-longitudinal case.↩
Note there is nothing random about them, they merely represent all causes not identified by the model that vary amongst individuals. They are an additional source of uncertainty.↩