Fixed Effects Models

Fixed Effects (FE) models are a terribly named approach to dealing with clustered data, but in the simplest case, serve as a contrast to the random effects (RE) approach in which there are only random intercepts⁵. Despite the nomenclature, there is mainly one key difference between these models and the ‘mixed’ models we discuss. Both allow a (random) cluster-specific effect to be added to model, but the FE approach allows that effect to correlate with the predictors, while the RE does not (by default). In practice however, this means that they may end up being quite different conceptual models as well. As with cluster robust standard errors, economists, and again those trained in that fashion, have historically preferred these models. In my experience they are rarely used in other disciplines.

First, let us understand this cluster-specific effect. In the standard regression setting we have a basic intercept, while here, each cluster will provide a nudge above or below that overall effect. Consider the following model (ignoring treatment for now):

\[ y = \textrm{Int} + \textrm{ClusterEffect} + b*\mathrm{time} + \epsilon \]

The cluster effect is different from one cluster to the next, but constant for a given cluster. One way we could perform such a model is just to include id as a predictor, thereby getting a unique estimate for each cluster added to the model. In other words, we can also see the situation as if one had simply created a dummy variable for id and conducted a standard linear model. This is in fact how one can think of the FE model, though where the cluster-specific effects are assumed constants to be estimated, and in the past these models were sometimes referred to as least squares dummy variable (LSDV) regression models⁶. If you actually run the LSDV model, the statistical results for time will be identical to the fixed effects model.

Why would we be worried about the potential correlation between the cluster-specific effects and the model covariates? In typical social science and economic data it’s probably likely that unspecified cluster level effects might have some correlation with the individual level covariates. This leads to inconsistent estimates in the RE approach, and as such the FE might be used instead.

In the following we use the plm package to estimate the FE model. I highly recommend reading the excellent vignette for this package if you are one of those econometrically trained folk new to R or the mixed model approach, or conversely, other folk wishing to understand the econometric perspective.

FE_mod = plm(y ~ as.numeric(time) + treatment, data=d, index='id', model='within')
summary(FE_mod)

Oneway (individual) effect Within Model

Call:
plm(formula = y ~ as.numeric(time) + treatment, data = d, model = "within", 
    index = "id")

Balanced Panel: n = 2500, T = 4, N = 10000

Residuals:
      Min.    1st Qu.     Median    3rd Qu.       Max. 
-2.6612736 -0.4047498  0.0026523  0.4001706  2.5606778 

Coefficients:
                  Estimate Std. Error t-value  Pr(>|t|)    
as.numeric(time) 0.5085588  0.0064962  78.285 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Total Sum of Squares:    7188.7
Residual Sum of Squares: 3955.8
R-Squared:      0.44972
Adj. R-Squared: 0.26627
F-statistic: 6128.62 on 1 and 7499 DF, p-value: < 2.22e-16

Note how there is no intercept or treatment effect. In this circumstance of a random intercept model, the FE model can also be seen as a ‘demeaning’ approach, were the model within a cluster is:

\[y_i-\bar{y_i} = (X_i-\bar{X_i})\beta + (\epsilon-\bar{\epsilon_i})\]

In other words, we subtract the mean from each covariate and response and run the model that way (this is also known as the within transformation, but again, mostly to those from the econometrics world; it is more often referred to as ‘centering’). Note the following produces the same result, although the standard error for time is off⁷.

d %>% 
  group_by(id) %>% 
  mutate(ybar = y-mean(y),
         timebar = time-mean(time)) %$% 
  lm(ybar ~ timebar) %>% 
  summary


Call:
lm(formula = ybar ~ timebar)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.66127 -0.40475  0.00265  0.40017  2.56068 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 5.741e-19  6.290e-03    0.00        1    
timebar     5.086e-01  5.626e-03   90.39   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.629 on 9998 degrees of freedom
Multiple R-squared:  0.4497,    Adjusted R-squared:  0.4497 
F-statistic:  8171 on 1 and 9998 DF,  p-value: < 2.2e-16

Because of this, if something is constant within a cluster, it drops out of the model, and this includes anything that has only one observation even if the covariate is normally time-varying. So, not only do you lose the ability to model cluster level effects, though these are ‘controlled for’, you also lose data. In this we lose the treatment effect entirely, which would be completely unacceptable in most circumstances⁸.

There seem to be philosophical reasons why some prefer FE models that go beyond the practical, because I don’t understand the often rigid preference by some adherents over RE models given the drawbacks. I personally have never come across a valid justification for not investigating cluster level covariates if they are available (i.e. almost always in social science, educational, economic, epidemiological and other data, and often would simply include cluster-level averages of available variables). In addition, few of the applications of FE models actually seem interested in the cluster-specific effects, in short treating the clustering as a nuisance, much like the cluster-robust standard error approach⁹.

Pros

Does not assume X and random effects are uncorrelated.

Cons

Ignores cluster level covariates or anything cluster constant (i.e. will almost always lose data).
Doesn’t easily extend to more complex clustering structures.
Less efficient than RE if RE assumption holds
Technically one can do something akin to random slopes also¹⁰ (mentioned in passing in Greene), but nothing out there does.
Awkward (my opinion) extension to GLM setting for binary and counts
More will be pointed out with the mixed models

Gist: If your goal is statistical consistency above all other considerations, this approach is for you. However, given that with mixed models we can potentially overcome the primary issue that the FE model addresses (RE correlated with covariates¹¹), this seems a difficult modeling approach to justify. For more see Bell et al. (2016). Fixed and Random effects: making an informed choice.

Actually, FE models extend beyond this but I’ve never seen the treatment in textbook presentations, nor am familiar with tools that do so aside from the latent variable approach.↩
This Stata note highlights the distinction.↩
This is due to the fact that estimation of the group means was not taken into account.↩
Note that you could still get the interaction of time x treatment, which you’d definitely want to examine in the experimental longitudinal setting. In other circumstances and with numerous covariates, this may become unwieldy, and then there are the issues of when the interaction is not significant, you have no main effect to fall back on, and you’re also testing an interaction without all the component main effects.↩
Still applies here, i.e. we can still use cluster robust standard errors.↩
And you can defeat the purpose of the FE model by include a covariate by group interaction. However, there would also be no regularizing effect on all the coefficients produced by such a model, unlike the RE model.↩
One can use aggregated values of the potentially offending covariates as cluster level covariates. For example, if we had people clustered within political district, we could use average income as a district-level covariate. Such models are sometimes referred to as hybrids, incorporating both the FE and RE approaches, but this is unwarranted. All three are simply random effects models of different kinds.↩