Predictions with an offset

Reconciling R and Stata Approaches

Michael Clark https://m-clark.github.io
2020-06-16

Table of Contents


Introduction

Getting predictions in R is and always has been pretty easy for the vast majority of packages providing modeling functions, as they also provide a predict method for the model objects. For those in the Stata world, they typically use margins for this, but when they come to R, there is no obvious option for how to go about it in the same way1. Likewise, some in the R world catch a whiff of Stata’s margins and would want something similar, but may not be sure where to turn.

A little digging will reveal there are several packages that will provide the same sort of thing. In addition, there are numerous resources for both R and Stata for getting marginal results (i.e. predictions). However, here we note the issues that arise when models include an offset. Offsets are commonly used to model rates when the target variable is a count, but are used in other contexts as well. The Stata documentation for the margins command offers no specific details of how the offset/exposure is treated, and some R packages appear not to know what to do with it, or offer few options to deal with it. So even when the models are identical, marginal estimates might be different in R and Stata. Here we’ll try to sort some of this out.

Get some data

We will use the Insurance data from the MASS package which most with R will have access to. From the helpfile:

The data given in data frame Insurance consist of the numbers of policyholders of an insurance company who were exposed to risk, and the numbers of car insurance claims made by those policyholders.

We do a bit of minor processing, and I save the data as a Stata file in case anyone wants to play with it in that realm.


library(tidyverse)

set.seed(123)

insurance = MASS::Insurance %>%
  rename_all(tolower) %>%
  mutate(
    # create standard rather than ordered factors for typical output
    age   = factor(age, ordered = FALSE),
    group = factor(group, ordered = FALSE),
    
    # create a numeric age covariate for later
    age_num = case_when(
      age == '<25'   ~ sample(18:25, n(), replace = T),
      age == '25-29' ~ sample(25:29, n(), replace = T),
      age == '30-35' ~ sample(30:35, n(), replace = T),
      age == '>35'   ~ sample(36:75, n(), replace = T),
    ),
    
    # for stata consistency
    ln_holders = log(holders)
  )


haven::write_dta(insurance, 'data/insurance.dta')

Let’s take a quick peek to get our bearings.


$`Numeric Variables`
    Variable  N   Mean     SD  Min    Q1 Median     Q3     Max Missing
1    holders 64 364.98 622.77  3.0 46.75 136.00 327.50 3582.00       0
2     claims 64  49.23  71.16  0.0  9.50  22.00  55.50  400.00       0
3    age_num 64  35.23  15.83 18.0 25.00  29.50  35.50   75.00       0
4 ln_holders 64   4.90   1.48  1.1  3.84   4.91   5.79    8.18       0

$`Categorical Variables`
# A tibble: 12 x 4
   Variable Group  Frequency   `%`
   <chr>    <chr>      <int> <dbl>
 1 district 1             16    25
 2 district 2             16    25
 3 district 3             16    25
 4 district 4             16    25
 5 group    <1l           16    25
 6 group    >2l           16    25
 7 group    1-1.5l        16    25
 8 group    1.5-2l        16    25
 9 age      <25           16    25
10 age      >35           16    25
11 age      25-29         16    25
12 age      30-35         16    25

Model

Starting out, we run a model in as simple a form as possible. I use just a standard negative binomial with a single covariate age, so we can clearly see how the ouptut is being produced. Note that age has four categories as seen above.


nb_glm_offset = MASS::glm.nb(claims ~  age + offset(ln_holders), data = insurance)

summary(nb_glm_offset)

Call:
MASS::glm.nb(formula = claims ~ age + offset(ln_holders), data = insurance, 
    init.theta = 28.40119393, link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.29753  -0.79833  -0.04613   0.82465   3.00825  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.59233    0.09071 -17.554  < 2e-16 ***
age25-29    -0.12697    0.11743  -1.081   0.2796    
age30-35    -0.25340    0.11558  -2.193   0.0283 *  
age>35      -0.41940    0.10583  -3.963  7.4e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(28.4012) family taken to be 1)

    Null deviance: 86.761  on 63  degrees of freedom
Residual deviance: 67.602  on 60  degrees of freedom
AIC: 444.38

Number of Fisher Scoring iterations: 1

              Theta:  28.40 
          Std. Err.:  9.80 

 2 x log-likelihood:  -434.385 

Now we run it with Stata. We get the same result, so this means we can’t get different predictions if we do the same thing in both R or Stata2.


nbreg claims i.age, offset(ln_holders) nolog

Negative binomial regression                    Number of obs     =         64
                                                LR chi2(3)        =      15.73
Dispersion     = mean                           Prob > chi2       =     0.0013
Log likelihood =  -217.1925                     Pseudo R2         =     0.0349

------------------------------------------------------------------------------
      claims |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |
      25-29  |  -.1269666   .1182871    -1.07   0.283    -.3588049    .1048718
      30-35  |  -.2533991    .116535    -2.17   0.030    -.4818034   -.0249948
        >35  |  -.4193953   .1068907    -3.92   0.000    -.6288973   -.2098933
             |
       _cons |  -1.592325   .0913742   -17.43   0.000    -1.771415   -1.413235
  ln_holders |          1  (offset)
-------------+----------------------------------------------------------------
    /lnalpha |  -3.346431   .3544212                     -4.041084   -2.651778
-------------+----------------------------------------------------------------
       alpha |   .0352098   .0124791                      .0175784    .0705257
------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 38.33                Prob >= chibar2 = 0.000

emmeans

First let’s use emmeans, a very popular package for getting estimated marginal means, to get the predicted counts for each age group.


library(emmeans)

emmeans(
  nb_glm_offset,
  ~ age,
  type = "response",
  offset = mean(insurance$ln_holders)  #  default
)
age response SE df asymp.LCL asymp.UCL
<25 27.437 2.489 Inf 22.968 32.776
25-29 24.166 1.802 Inf 20.880 27.969
30-35 21.295 1.525 Inf 18.507 24.505
>35 18.038 0.983 Inf 16.211 20.072

How is this result obtained? It is just the prediction at each value of the covariate, with the offset held at its mean. We can duplicate this result by using the predict method and specifying a data frame with the values of interest.


nd = data.frame(
  age = levels(insurance$age),
  ln_holders = mean(insurance$ln_holders)
)

predict(
  nb_glm_offset,
  newdata = nd,
  type = 'response'
)
prediction
27.437
24.166
21.295
18.038

As an explicit comparison, the intercept represents group ‘<25’, and if we exponentiate and add the mean offset we get:

\(exp(Intercept + \overline{ln\_holders}) = e^{-1.59+4.90} = \qquad\) 27.437

Stata: basic margins

Now let’s look at Stata. First we want just the basic margins output.


margins age 

Adjusted predictions                            Number of obs     =         64
Model VCE    : OIM

Expression   : Predicted number of events, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |
        <25  |   74.25681   6.785153    10.94   0.000     60.95815    87.55546
      25-29  |   65.40266   4.936493    13.25   0.000     55.72731    75.07801
      30-35  |   57.63502   4.195113    13.74   0.000     49.41275    65.85729
        >35  |   48.81971   2.747836    17.77   0.000     43.43405    54.20537
------------------------------------------------------------------------------

These values, while consistent in pattern, are much different than the emmeans output, so what is going on?

R by hand

In this model, we only have the age covariate and the offset, so there really isn’t much to focus on besides the latter. To replicate the Stata output in R, we will use all values of the offset for every level of age, and subsequently get an average prediction for each age group. First, we create a data frame for prediction using expand.grid, get the predictions for all those values, then get mean prediction per group.


predictions = 
  expand.grid(
    age        = levels(insurance$age), 
    ln_holders = insurance$ln_holders
  ) %>% 
  mutate(prediction = predict(nb_glm_offset, newdata = ., type = 'response')) %>% 
  group_by(age) %>% 
  summarise(avg_prediction = mean(prediction)) 
age avg_prediction
<25 74.257
25-29 65.403
30-35 57.635
>35 48.820

emmeans

The emmeans doesn’t appear to allow one to provide all values of the offset, as adding additional values just applies them to each group and then recycles. In this case, it would just use the first four values of ln_holders for each age group respectively, which is not what we want.


emmeans(
  nb_glm_offset,
  ~ age,
  type = "response",
  offset = insurance$ln_holders
)

 age   response    SE  df asymp.LCL asymp.UCL
 <25       40.1  3.64 Inf      33.6      47.9
 25-29     47.3  3.53 Inf      40.9      54.8
 30-35     38.8  2.78 Inf      33.8      44.7
 >35      224.7 12.25 Inf     201.9     250.0

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

insurance$ln_holders[1:4]

[1] 5.283204 5.575949 5.505332 7.426549

If we add the offset to the spec argument, it still just fixes it at the mean (and I tried variations on the spec). So at least using the standard approaches with this model does not appear to give you the same thing as Stata.


emmeans(
  nb_glm_offset,
  ~ age + offset(ln_holders),
  type = "response"
)

 age   ln_holders response    SE  df asymp.LCL asymp.UCL
 <25          4.9     27.4 2.489 Inf      23.0      32.8
 25-29        4.9     24.2 1.802 Inf      20.9      28.0
 30-35        4.9     21.3 1.525 Inf      18.5      24.5
 >35          4.9     18.0 0.983 Inf      16.2      20.1

Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

Unfortunately Stata has the opposite issue. Trying to set the offset to the mean results in an error, and using atmeans doesn’t change the previous result.


margins age, atmeans
margins age, at(ln_holders = 10.27)

Adjusted predictions                            Number of obs     =         64
Model VCE    : OIM

Expression   : Predicted number of events, predict()
at           : 1.age           =         .25 (mean)

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |
        <25  |   74.25681   6.785153    10.94   0.000     60.95815    87.55546
      25-29  |   65.40266   4.936493    13.25   0.000     55.72731    75.07801
      30-35  |   57.63502   4.195113    13.74   0.000     49.41275    65.85729
        >35  |   48.81971   2.747836    17.77   0.000     43.43405    54.20537
------------------------------------------------------------------------------

variable ln_holders not found in list of covariates
r(322);

end of do-file
r(322);

margins

The margins package explicitly attempts to duplicate Stata’s margins command, but here we can see it has an issue with the offset.


library(margins)

margins(
  nb_glm_offset,
  variables = 'age',
  type = "response"  # default
)

Error in offset(ln_holders): could not find function "offset"

The offset function is part of the stats package of the base R installation, so I tried rerunning the model using stats::offset, but this makes the offset just like any other covariate, i.e. it did not have a fixed coefficient of 1. Changing the model to a standard glm class with poisson and moving the offset to the offset argument did work, and produces the results for the differences in predictions for each group from the reference group (dydx in Stata), but we’ll visit this type of result later3. However, the offset argument is not available to glm.nb, so we’re stuck for now.


pois_glm_offset = glm(
  claims ~  age,
  data   = insurance,
  family = 'poisson',
  offset = ln_holders
)

margins(
  pois_glm_offset,
  variables = 'age',
  type = "response"  # default
)

 age25-29 age30-35 age>35
   -10.32   -18.46 -28.79

expand.grid(
    age        = levels(insurance$age), 
    ln_holders = insurance$ln_holders
  ) %>% 
  mutate(prediction = predict(pois_glm_offset, newdata = ., type = 'response')) %>% 
  group_by(age) %>% 
  summarise(avg_prediction = mean(prediction)) %>% 
  mutate(diff = avg_prediction - avg_prediction[1]) 

# A tibble: 4 x 3
  age   avg_prediction  diff
  <fct>          <dbl> <dbl>
1 <25             73.4   0  
2 25-29           63.1 -10.3
3 30-35           55.0 -18.5
4 >35             44.7 -28.8

Stata: over

In Stata, with categorical values we can also use the over approach. What do we get in this case?


margins, over(age)

Predictive margins                              Number of obs     =         64
Model VCE    : OIM

Expression   : Predicted number of events, predict()
over         : age

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |
        <25  |   14.47052   1.322232    10.94   0.000       11.879    17.06205
      25-29  |   26.16218   1.974682    13.25   0.000     22.29188    30.03249
      30-35  |   29.67738   2.160145    13.74   0.000     25.44358    33.91119
        >35  |   141.0984   7.941774    17.77   0.000     125.5328    156.6639
------------------------------------------------------------------------------

These are very different from our previous results for Stata, so what’s happening here?

R by hand

This can be duplicated with the predict function as follows. While similar to the previous approach, here only the observed values of the offset for each group are used. We then make predictions for all values of the data and average them by group.


predictions_over = insurance %>%
  group_by(age) %>%
  group_modify(
    ~ data.frame(
        prediction = mean(
          predict(nb_glm_offset, newdata = ., type = 'response')
        )
      ),
    keep = TRUE
  ) 
age prediction
<25 14.471
25-29 26.162
30-35 29.677
>35 141.098

The pattern is actually in the opposite direction, which is unexpected, but probably just reflects the fact that we just don’t have much data. However, it’s good to note that these respective approaches would not necessarily tell you the same thing.

emmeans

I currently don’t know of an equivalence for emmeans in this offset case, and initial searches didn’t turn up much, though it is hard to distinguish specific ‘average predictions’ from many other similar scenarios. I attempted the following, which keeps the values of ln_holders, but it only keeps unique ones, and it’s not reproducing what I would expect, although it implies that it is averaging over the offset values.


rg = ref_grid(nb_glm_offset, cov.keep = c('ln_holders'))

# type is ignored for some reason, so we exponentiate after
em_over = emmeans(rg, ~age, type = 'response')  

data.frame(em_over) %>% 
  mutate(count = exp(emmean))

    age   emmean         SE  df asymp.LCL asymp.UCL     count
1   <25 2.055962 0.09071069 Inf  1.878173  2.233752  7.814354
2 25-29 2.835734 0.07457136 Inf  2.689577  2.981892 17.042911
3 30-35 3.010866 0.07161826 Inf  2.870497  3.151236 20.304984
4   >35 4.574641 0.05450461 Inf  4.467813  4.681468 96.993166

margins

The over approach for the margins package is not explicitly supported. The package author states:

At present, margins() does not implement the over option. The reason for this is also simple: R already makes data subsetting operations quite simple using simple [ extraction. If, for example, one wanted to calculate marginal effects on subsets of a data frame, those subsets can be passed directly to margins() via the data argument (as in a call to prediction()).

It would look something like the following, but we still have the offset problem for this negative binomial class, so I don’t show a result.


insurance %>% 
  group_by(age) %>% 
  group_map(~margins(nb_glm_offset, .), keep = TRUE) 

Stata: dydx

Categorical Covariate

Sometimes people want differences as you move from one level (e.g. the reference level) to the next for some covariate, the ‘average marginal effect’. In Stata this is obtained with the dydx option.


margins, dydx(age)

Conditional marginal effects                    Number of obs     =         64
Model VCE    : OIM

Expression   : Predicted number of events, predict()
dy/dx w.r.t. : 2.age 3.age 4.age

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |
      25-29  |  -8.854149   8.375143    -1.06   0.290    -25.26913     7.56083
      30-35  |  -16.62179   7.959339    -2.09   0.037     -32.2218   -1.021769
        >35  |   -25.4371   7.297716    -3.49   0.000    -39.74036   -11.13383
------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

In R, we can get this from our initial predictions that used all offset values by just taking the differences in the predicted values.


predictions %>% 
  mutate(dydx = avg_prediction - avg_prediction[1])
age avg_prediction dydx
<25 74.257 0.000
25-29 65.403 -8.854
30-35 57.635 -16.622
>35 48.820 -25.437

Continuous predictor

Now we’ll consider a continuous covariate. Here we’ll again just focus on a simple example where we rerun the model, but with age as numeric rather than binned4. For comparison we’ll set the numeric age values at roughly the midpoint of the binned categories. We can do this using the at option.


margins, at(age_num = (21, 27, 32, 50))

Negative binomial regression                    Number of obs     =         64
                                                LR chi2(1)        =      12.58
Dispersion     = mean                           Prob > chi2       =     0.0004
Log likelihood = -218.76542                     Pseudo R2         =     0.0280

------------------------------------------------------------------------------
      claims |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     age_num |  -.0079258   .0020657    -3.84   0.000    -.0119746   -.0038771
       _cons |  -1.521278   .0919801   -16.54   0.000    -1.701556   -1.341001
  ln_holders |          1  (offset)
-------------+----------------------------------------------------------------
    /lnalpha |  -3.191873   .3194707                     -3.818025   -2.565722
-------------+----------------------------------------------------------------
       alpha |   .0410948   .0131286                      .0219712    .0768636
------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 69.85                Prob >= chibar2 = 0.000


Adjusted predictions                            Number of obs     =         64
Model VCE    : OIM

Expression   : Predicted number of events, predict()

1._at        : age_num         =          21

2._at        : age_num         =          27

3._at        : age_num         =          32

4._at        : age_num         =          50

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         _at |
          1  |   67.50043     3.7343    18.08   0.000     60.18133    74.81952
          2  |   64.36558   3.025254    21.28   0.000     58.43619    70.29497
          3  |   61.86471     2.5763    24.01   0.000     56.81526    66.91417
          4  |   53.63947   2.270244    23.63   0.000     49.18988    58.08907
------------------------------------------------------------------------------

Again, we can duplicate this with the basic predict function. We just predict at that value of the covariate for all values of the offset, and get the average prediction as we did before.


nb_glm_offset_cont = MASS::glm.nb(claims ~  age_num + offset(ln_holders), data = insurance)

predictions = expand.grid(
  age_num = c(21, 27, 32, 50),
  ln_holders = insurance$ln_holders
) %>% 
  mutate(pred = 
           predict(
             nb_glm_offset_cont,
             newdata = .,
             type = 'response'
           )
  ) %>% 
  group_by(age_num) %>% 
  summarise(average_prediction = mean(pred))
age_num average_prediction
21 67.500
27 64.366
32 61.865
50 53.639

We can also get the dydx for the continuous covariate, which is the derivative of the target with respect to the covariate. In linear models, this is just the regression coefficient, but here we have to do things a little differently.


margins, dydx(age_num)

Average marginal effects                        Number of obs     =         64
Model VCE    : OIM

Expression   : Predicted number of events, predict()
dy/dx w.r.t. : age_num

------------------------------------------------------------------------------
             |            Delta-method
             |      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     age_num |  -.4255242   .1050737    -4.05   0.000    -.6314649   -.2195836
------------------------------------------------------------------------------

As noted for the categorical case, this value is the average marginal effect. As the Stata reference describes:

It is not necessarily true that dydx() = 0.5 means that “y increases by 0.5 if x increases by 1”. It is true that “y increases with x at a rate such that, if the rate were constant, y would increase by 0.5 if x increased by 1”

This qualified interpretation may not be of much value in contexts where the rate is not constant, but we can still see what Stata is doing.

R by hand

For dydx, when it comes to continuous covariates, there isn’t an obvious change in the covariate to use (i.e. the dx) to evaluate at each point, as is the case with categorical variables, which can use a reference group. So what we do is use a small arbitrary difference (\(\epsilon\)) for the covariate at its observed values, get the predictions for the values above and below the observed value, and then average those differences in predicted values. For comparison to Stata, I set \(\epsilon\) to the value used by the margins command. Note that we are only using the observed values for the offset.


h = function(x, epsilon = 1e-5) max(abs(x), 1) * sqrt(epsilon)

age_dx_plus = insurance %>% 
  select(age_num, ln_holders) %>% 
  mutate(age_num = age_num + h(age_num))

age_dx_minus = insurance %>% 
  select(age_num, ln_holders) %>% 
  mutate(age_num = age_num - h(age_num))

predictions_dydx = 
  tibble(
    dy = 
      predict(nb_glm_offset_cont, age_dx_plus,  type = 'response') -
      predict(nb_glm_offset_cont, age_dx_minus, type = 'response'), 
    dx   = age_dx_plus$age_num - age_dx_minus$age_num,
    dydx = dy/dx
  )

# summarise(predictions_dydx, ame = mean(dydx))
ame
-0.42552

So we just get predictions for a small difference in age for each value of age, and average that difference in predictions.

emmeans

The emmeans package is primarily geared toward factor variables, but does have support for numeric variables interacting with factors. However, this isn’t what we’re really looking for here.

margins

We can however use the margins package for this, and it provides the same result as before. For whatever reason, it doesn’t have an issue with the offset if we use the lower level dydx function.


dydx(
  insurance,
  nb_glm_offset_cont,
  variable = 'age_num',
  eps = 1e-5
) %>%
  summarise(ame = mean(dydx_age_num))
ame
-0.42552

For more on the dydx case for continuous variables in general, see the resources.

Other complications

Obviously models will have more than one covariate, and in the particular case that was brought to my attention, there were also random effects. I may explore more in the future, but the general result should hold in those circumstances. As a quick example5, we can get the same age results for both, by getting the age group predictions with all values of the dataset (not just the offset).


nb_glm_offset_full = MASS::glm.nb(
  claims ~  age + group + district + offset(ln_holders), 
  data = insurance
)

summary(nb_glm_offset_full)

Call:
MASS::glm.nb(formula = claims ~ age + group + district + offset(ln_holders), 
    data = insurance, init.theta = 449932.3841, link = log)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.46551  -0.50795  -0.03196   0.55554   1.94022  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.82174    0.07679 -23.723  < 2e-16 ***
age25-29    -0.19101    0.08286  -2.305 0.021155 *  
age30-35    -0.34495    0.08138  -4.239 2.25e-05 ***
age>35      -0.53667    0.06996  -7.671 1.70e-14 ***
group1-1.5l  0.16133    0.05054   3.192 0.001412 ** 
group1.5-2l  0.39281    0.05500   7.142 9.23e-13 ***
group>2l     0.56341    0.07232   7.791 6.67e-15 ***
district2    0.02587    0.04302   0.601 0.547637    
district3    0.03853    0.05052   0.763 0.445686    
district4    0.23421    0.06168   3.797 0.000146 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(449932.4) family taken to be 1)

    Null deviance: 236.212  on 63  degrees of freedom
Residual deviance:  51.416  on 54  degrees of freedom
AIC: 390.74

Number of Fisher Scoring iterations: 1

              Theta:  449932 
          Std. Err.:  4185430 
Warning while fitting theta: iteration limit reached 

 2 x log-likelihood:  -368.745 

nbreg claims i.age i.group i.district, offset(ln_holders) nolog
margins age

Negative binomial regression                    Number of obs     =         64
                                                LR chi2(9)        =      81.37
Dispersion     = mean                           Prob > chi2       =     0.0000
Log likelihood = -184.37077                     Pseudo R2         =     0.1808

------------------------------------------------------------------------------
      claims |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |
      25-29  |  -.1910093   .0828564    -2.31   0.021    -.3534049   -.0286137
      30-35  |  -.3449496   .0813741    -4.24   0.000    -.5044399   -.1854592
        >35  |  -.5366602   .0699556    -7.67   0.000    -.6737706   -.3995497
             |
       group |
     1-1.5l  |   .1613446   .0505323     3.19   0.001     .0623031     .260386
        >2l  |   .5634109   .0723153     7.79   0.000     .4216755    .7051462
             |
    district |
          2  |   .0258648   .0430156     0.60   0.548    -.0584443    .1101739
          3  |   .0385174   .0505114     0.76   0.446    -.0604832     .137518
          4  |   .2341969   .0616732     3.80   0.000     .1133197    .3550742
             |
       _cons |  -1.821741   .0767876   -23.72   0.000    -1.972242    -1.67124
  ln_holders |          1  (offset)
-------------+----------------------------------------------------------------
    /lnalpha |  -19.19245   546.8434                     -1090.986    1052.601
-------------+----------------------------------------------------------------
       alpha |   4.62e-09   2.53e-06                             0           .
------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 7.1e-06              Prob >= chibar2 = 0.499


Predictive margins                              Number of obs     =         64
Model VCE    : OIM

Expression   : Predicted number of events, predict()

------------------------------------------------------------------------------
             |            Delta-method
             |     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         age |
        <25  |   76.39696   5.069345    15.07   0.000     66.46123    86.33269
      25-29  |   63.11343   3.143376    20.08   0.000     56.95253    69.27433
      30-35  |   54.10861   2.544524    21.26   0.000     49.12143    59.09579
        >35  |   44.66913   .9835578    45.42   0.000      42.7414    46.59687
------------------------------------------------------------------------------

To do this with predict, we make predictions for all observed values as if they were at each level of age. Then we average them for each age group, just like we did before.


predictions_full_model =  
  map_df(1:4, function(i) mutate(insurance, age = levels(age)[i])) %>% 
  mutate(
    age = factor(age, levels(insurance$age)),   # convert back to factor
    prediction = predict(nb_glm_offset_full, newdata = ., type = 'response')
  )  %>% 
  group_by(age) %>% 
  summarise(avg_prediction = mean(prediction)) 
age avg_prediction
<25 76.397
25-29 63.113
30-35 54.109
>35 44.669

R Packages

To summarize R’s capabilities with Stata-like margins with models using an offset, we have a few options we can note. First, we can get the values using the predict method. Then there are the packages to help with getting these types of predictions. margins explicitly attempts to replicate Stata-like margins for standard and some more complex models, but there doesn’t appear to be documentation on how the offset is dealt with by default. Furthermore, care must be taken if it isn’t an explicitly supported model. As we have also seen, emmeans provides many predictions of the sort discussed here, supports many more models, produces results in a nice format, and has plotting capabilities. However, it’s mostly suited toward factor variables.

Beyond those, ggeffects uses predict and emmeans under the hood, so offers a nice way to do the same sorts of things, but with a more viable plot as a result. Other packages and functions are available for specific settings. For example, conditional_effects in the brms package provides predictions and visualization for the bayesian setting.

Summary

Hopefully this will clarify the discrepancies between R and Stata with models using an offset. Honestly, I pretty much always use the predict function with my specified data values because I know what it’s doing and I can understand the results without hesitation regardless of model or package used. Furthermore, if one knows their data at all, it should be possible to specify covariate values that are meaningful pretty easily. On the other hand, getting predictions at averages can cause conceptual issues with categorical variables in many settings, and getting average effects often also can be hard to interpret (e.g. nonlinear relationships).

One thing you don’t get with some of the averaged predictions using the predict function are interval estimates, but this could be obtained via bootstrapping. Otherwise, most predict methods provide the standard error for a prediction with an additional argument (e.g. se.fit = TRUE), so if you getting predictions at key values of the variables it is trivial to get the interval estimate. In general, most R packages are just using predict under the hood, so being familiar with it will likely get you what you need on its own.

Resources

Reference

Stata reference for margins

emmeans

margins

ggeffects

Notes

Marginal Effects- Rich Williams notes- 1

Marginal Effects- Rich Williams notes- 2

Marginal Effects- Rich Williams notes- 3

Marginal Effects Stata Article by Rich Williams

Josh Errickson’s comparisons of Stata, emmeans, and ggeffects

UCLA IDRE FAQ (Margins command section)

Stata FAQ (based on old mfx command)

Appendix

Just for giggles, I did an average marginal effect for a GAM, though I find little utility in it for the relationship shown. Confirmed via gratia and margins.


library(mgcv)
set.seed(2)
dat <- gamSim(1, n = 400, dist = "normal", scale = 2)

Gu & Wahba 4 term additive model

b <- gam(y ~ s(x2), data = dat)

visibly::plot_gam(b)


# set change step
h = 1e-5

b_dx_plus = dat %>% 
  select(x2) %>% 
  mutate(x2 = x2 + h)

b_dx_minus = dat %>% 
  select(x2) %>% 
  mutate(x2 = x2 - h)

predictions_dydx = 
  tibble(
    x2 = dat$x2,
    dy = 
      predict(b, b_dx_plus,  type = 'response') -
      predict(b, b_dx_minus, type = 'response'), 
    dx   = b_dx_plus$x2 - b_dx_minus$x2,
    dydx = dy/dx
  ) 

gratia_result  = gratia::derivatives(b, newdata = dat, eps = h)
margins_result = margins::dydx(dat, b, variable = 'x2', eps = h^2)  # note that margins uses the h function specified previously

all.equal(as.numeric(predictions_dydx$dydx), gratia_result$derivative)

[1] "Mean relative difference: 5.596277e-05"

all.equal(as.numeric(predictions_dydx$dydx), as.numeric(margins_result$dydx_x2))

[1] TRUE

summarise(
  predictions_dydx, 
  ame = mean(dydx), 
  ame_gratia = mean(gratia_result$derivative),
  ame_margins = mean(margins_result$dydx_x2)
) 

# A tibble: 1 x 3
    ame ame_gratia ame_margins
  <dbl>      <dbl>       <dbl>
1  1.76       1.76        1.76

  1. Not the least of which is that most outside of econometrics don’t call predictions margins, since these days we aren’t adding results to the margin of a hand-written table.↩︎

  2. For those in the R world, the i.age tells Stata to treat the age factor as, well, a factor. Stata’s alpha is 1/theta from R’s output.↩︎

  3. The margins package does do predictions rather than the marginal effects, but it, like others, is just a wrapper for the predict method, and doesn’t appear to average them, so I don’t demonstrate that.↩︎

  4. There is rarely a justifiable reason to discretize age as near as I can tell, and doing so inevitably results in less satisfying and less reliable conclusions.↩︎

  5. There is a weird print issue where the Stata output isn’t showing the coefficient for one of the levels of group, but the model is correct and was verified directly using Stata.↩︎

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com//m-clark/m-clark.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Clark (2020, June 16). Michael Clark: Predictions with an offset. Retrieved from https://m-clark.github.io/posts/2020-06-15-predict-with-offset/

BibTeX citation

@misc{clark2020predictions,
  author = {Clark, Michael},
  title = {Michael Clark: Predictions with an offset},
  url = {https://m-clark.github.io/posts/2020-06-15-predict-with-offset/},
  year = {2020}
}