Ignore Data Dependency

The first thing we can do is ignore the situation and just run a standard regression. This is actually okay if you have very few clusters, and put the cluster id in the model as a fixed effect. Otherwise, this is not acceptable with regard to the standard errors (SE), as cluster level covariates will be treated as if there are N*timepoint observations (typically underestimating the SE as a result), while the standard error for the time-varying covariates will not account for the clustering (typically overestimating).

  Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.13 0.02 5.52 0
time 0.51 0.01 46.74 0
treatmenttreatment -0.42 0.02 -17.33 0
Fitting linear model: y ~ time + treatment
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
10000 1.22 0.2 0.2

First be aware that the ‘treatmenttreatment’ label just tells us that the coefficient refers to moving from the reference group (i.e. ‘control’) to the treatment group, i.e. considers treatment a binary variable where 1 equals treatment and 0 control. Note that the coefficients are in the ballpark of where the true values are, save for the estimate of the residual variance, which packs in all sources of variance into one estimate. As mentioned though, the standard errors for the effects would be problematic.


  • Easy
  • Provides estimation of the effects most are primarily interested in


  • Standard errors are off
  • Ignores the cluster-specific effects, which may be highly interesting

Gist: Probably not viable for most situations.