Cluster Robust Variances

As mentioned, if we ignore the clustering, the so-called fixed effects estimates (i.e. coefficients) are correct, but their variances are not. We can get proper estimates of the standard errors via cluster robust standard errors, which are very popular in econometrics and fields trained in that fashion, but not widely used elsewhere in my experience. Essentially, these allow one to fire-and-forget, and treat the clustering as more of a nuisance.

In programs like Stata, obtaining these are basically an option for most modeling procedures. In R, it’s not quite as straightforward, but not difficult. There are packages such as sandwich that can provide heteroscedastic robust standard errors, but won’t necessarily take into account clustering. I provide a custom function that will work in this example so that the curtain can be pulled back a little, but the plm package would be the way to go for cluster robust standard errors. The plm package can take into account the serial autocorrelation via the ‘arellano’ input to the type argument.

library(plm)
# create numeric time so plm won't treat as a factor, time is a reserved name in plm
t2 = d$time                   
clusterrob_mod <- plm(y ~ t2 + treatment, data=d, model='pooling', index='id')
summary(clusterrob_mod)

Pooling Model

Call:
plm(formula = y ~ t2 + treatment, data = d, model = "pooling", 
    index = "id")

Balanced Panel: n = 2500, T = 4, N = 10000

Residuals:
     Min.   1st Qu.    Median   3rd Qu.      Max. 
-4.724323 -0.810506  0.030167  0.815708  4.231312 

Coefficients:
                    Estimate Std. Error  t-value  Pr(>|t|)    
(Intercept)         0.130929   0.023712   5.5217 3.442e-08 ***
t2                  0.508559   0.010880  46.7436 < 2.2e-16 ***
treatmenttreatment -0.421493   0.024328 -17.3255 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    18469
Residual Sum of Squares: 14792
R-Squared:      0.1991
Adj. R-Squared: 0.19894
F-statistic: 1242.57 on 2 and 9997 DF, p-value: < 2.22e-16

The following compares the different robust SEs we could produce at this point.

resids = resid(lm_mod)                    # residuals
xvar = solve(crossprod(lm_mod$x))         # (X'X)^-1
gd0 = data.frame(id=d$id, r=resids, lm_mod$x) 

# custom function to be applied within cluster
cluscom = function(mat){
  X = as.matrix(mat)[, 3:5]               # magic numbers represent int, time and treatment columns
  crossprod(X, tcrossprod(mat$r)) %*% X
}

# calculate within cluster variance
gd = gd0 %>% 
  group_by(id) %>% 
  do(xvar = cluscom(.))

# plm
plm_se = vcovHC(clusterrob_mod, method='arellano', cluster='group') %>% diag %>% sqrt %>% round(3)

# custom output
custom_se = (xvar %*% Reduce(`+`, gd$xvar) %*% xvar) %>% diag %>% sqrt %>% round(3)

# original lm
lm_se = vcov(lm_mod) %>% diag %>% sqrt  %>% round(3)

# non-clustered 'robust' se
lm_robse = vcovHC(lm_mod, type='HC0') %>% diag %>% sqrt %>% round(3)

# plm
plm_se

       (Intercept)                 t2 treatmenttreatment 
             0.030              0.009              0.042

# custom output
custom_se

       (Intercept)               time treatmenttreatment 
             0.030              0.009              0.042

# original lm
lm_se

       (Intercept)               time treatmenttreatment 
             0.024              0.011              0.024

# non-clustered 'robust' se
lm_robse

       (Intercept)               time treatmenttreatment 
             0.023              0.011              0.024

Pros

Don’t have to think too much about the issues at play
Fewer assumptions about the model
A general heteroscedastic ‘robust’ approach

Cons

Ignores cluster-specific effects
By default, assumes no intracluster correlation
Few tools go beyond simple clustering, requiring manual approach
Not as general as GEE approach
Suggests that there are problems in model specification that the method itself does not correct

Gist: if your goal is to simply do a standard GLM approach and essentially ignore the clustering, treating it more or less as a nuisance to overcome, and your grouping structure is simple, then this may be for you. However, note that the GEE approach as a possibly more flexible alternative that can still incorporate cluster robust standard errors⁴. Furthermore, differences between the robust and regular SE may suggest a model misspecification issue. ‘Fixing’ standard errors won’t solve that problem.

The cluster robust standard errors as depicted above assume independent error structure in the GEE model, and thus are a special case of GEE. When you peruse the GEE model, rerun it with the argument corst='ind' to duplicate the plm and custom results above.↩