A Survey Approach
I will only briefly mention an approach using survey design to show the similarity of results in that scenario to using cluster robust standard errors. We’ll use the survey package and subsequent svyglm function.
For comparison, we’ll use a cluster-based sampling design and nothing more. This assumes we are sampling clusters from the population of interest for which we want to make inferences to. To use most survey versions of models, the design must be specified a priori.
library(survey)
design = svydesign(ids=~id, data=d)
svy_mod = svyglm(y ~ time + treatment, data=d, design=design)
summary(svy_mod)
Call:
svyglm(formula = y ~ time + treatment, data = d, design = design)
Survey design:
svydesign(ids = ~id, data = d)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.130929 0.030500 4.293 1.83e-05 ***
time 0.508559 0.009143 55.624 < 2e-16 ***
treatmenttreatment -0.421493 0.041647 -10.121 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for gaussian family taken to be 1.479315)
Number of Fisher Scoring iterations: 2
These are quite similar to the cluster robust standard errors we got earlier.
(Intercept) time treatmenttreatment
0.030494 0.009141 0.041638
In fact, they’d be identical by using a finite population correction on the latter.
(Intercept) time treatmenttreatment
0.030500 0.009143 0.041647
I only note pros and cons that are relevant for our purposes. The pros and cons of dealing with survey design in general are quite complex and better hashed out elsewhere.
Pros
- Can incorporate different and quite complicated sampling designs
- More confidence in inference to the populations of interest
Cons
- The complexity of incorporating complex design and associated weights
- Beyond simpler settings it can be difficult to tell how best to utilize survey design within the modeling context
Gist: Our goal here was merely to provide a connection to survey design, but that’s a whole other situation that will not be considered further.