Predictions with an offset

Reconciling R and Stata Approaches

Michael Clark https://m-clark.github.io
2020-06-16

Introduction

Getting predictions in R is and always has been pretty easy for the vast majority of packages providing modeling functions, as they also provide a predict method for the model objects. For those in the Stata world, they typically use margins for this, but when they come to R, there is no obvious option for how to go about it in the same way1. Likewise, some in the R world catch a whiff of Stata’s margins and would want something similar, but may not be sure where to turn.

A little digging will reveal there are several packages that will provide the same sort of thing. In addition, there are numerous resources for both R and Stata for getting marginal results (i.e. predictions). However, here we note the issues that arise when models include an offset. Offsets are commonly used to model rates when the target variable is a count, but are used in other contexts as well. The Stata documentation for the margins command offers no specific details of how the offset/exposure is treated, and some R packages appear not to know what to do with it, or offer few options to deal with it. So even when the models are identical, marginal estimates might be different in R and Stata. Here we’ll try to sort some of this out.

Get some data

We will use the Insurance data from the MASS package which most with R will have access to. From the helpfile:

The data given in data frame Insurance consist of the numbers of policyholders of an insurance company who were exposed to risk, and the numbers of car insurance claims made by those policyholders.

• District: district of residence of policyholder (1 to 4): 4 is major cities.
• Group: group of car with levels <1 litre, 1–1.5 litre, 1.5–2 litre, >2 litre
• Age: the age of the insured in 4 groups labelled <25, 25–29, 30–35, >35
• Holders: number of policyholders
• Claims: number of claims

We do a bit of minor processing, and I save the data as a Stata file in case anyone wants to play with it in that realm.


library(tidyverse)

set.seed(123)

insurance = MASS::Insurance %>%
rename_all(tolower) %>%
mutate(
# create standard rather than ordered factors for typical output
age   = factor(age, ordered = FALSE),
group = factor(group, ordered = FALSE),

# create a numeric age covariate for later
age_num = case_when(
age == '<25'   ~ sample(18:25, n(), replace = T),
age == '25-29' ~ sample(25:29, n(), replace = T),
age == '30-35' ~ sample(30:35, n(), replace = T),
age == '>35'   ~ sample(36:75, n(), replace = T),
),

# for stata consistency
ln_holders = log(holders)
)

haven::write_dta(insurance, 'data/insurance.dta')

Let’s take a quick peek to get our bearings.


$Numeric Variables Variable N Mean SD Min Q1 Median Q3 Max Missing 1 holders 64 364.98 622.77 3.0 46.75 136.00 327.50 3582.00 0 2 claims 64 49.23 71.16 0.0 9.50 22.00 55.50 400.00 0 3 age_num 64 35.23 15.83 18.0 25.00 29.50 35.50 75.00 0 4 ln_holders 64 4.90 1.48 1.1 3.84 4.91 5.79 8.18 0$Categorical Variables
# A tibble: 12 x 4
Variable Group  Frequency   %
<chr>    <chr>      <int> <dbl>
1 district 1             16    25
2 district 2             16    25
3 district 3             16    25
4 district 4             16    25
5 group    <1l           16    25
6 group    >2l           16    25
7 group    1-1.5l        16    25
8 group    1.5-2l        16    25
9 age      <25           16    25
10 age      >35           16    25
11 age      25-29         16    25
12 age      30-35         16    25

Model

Starting out, we run a model in as simple a form as possible. I use just a standard negative binomial with a single covariate age, so we can clearly see how the ouptut is being produced. Note that age has four categories as seen above.


nb_glm_offset = MASS::glm.nb(claims ~  age + offset(ln_holders), data = insurance)

summary(nb_glm_offset)

Call:
MASS::glm.nb(formula = claims ~ age + offset(ln_holders), data = insurance,
init.theta = 28.40119393, link = log)

Deviance Residuals:
Min        1Q    Median        3Q       Max
-2.29753  -0.79833  -0.04613   0.82465   3.00825

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.59233    0.09071 -17.554  < 2e-16 ***
age25-29    -0.12697    0.11743  -1.081   0.2796
age30-35    -0.25340    0.11558  -2.193   0.0283 *
age>35      -0.41940    0.10583  -3.963  7.4e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(28.4012) family taken to be 1)

Null deviance: 86.761  on 63  degrees of freedom
Residual deviance: 67.602  on 60  degrees of freedom
AIC: 444.38

Number of Fisher Scoring iterations: 1

Theta:  28.40
Std. Err.:  9.80

2 x log-likelihood:  -434.385 

Now we run it with Stata. We get the same result, so this means we can’t get different predictions if we do the same thing in both R or Stata2.


nbreg claims i.age, offset(ln_holders) nolog

Negative binomial regression                    Number of obs     =         64
LR chi2(3)        =      15.73
Dispersion     = mean                           Prob > chi2       =     0.0013
Log likelihood =  -217.1925                     Pseudo R2         =     0.0349

------------------------------------------------------------------------------
claims |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |
25-29  |  -.1269666   .1182871    -1.07   0.283    -.3588049    .1048718
30-35  |  -.2533991    .116535    -2.17   0.030    -.4818034   -.0249948
>35  |  -.4193953   .1068907    -3.92   0.000    -.6288973   -.2098933
|
_cons |  -1.592325   .0913742   -17.43   0.000    -1.771415   -1.413235
ln_holders |          1  (offset)
-------------+----------------------------------------------------------------
/lnalpha |  -3.346431   .3544212                     -4.041084   -2.651778
-------------+----------------------------------------------------------------
alpha |   .0352098   .0124791                      .0175784    .0705257
------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 38.33                Prob >= chibar2 = 0.000

emmeans

First let’s use emmeans, a very popular package for getting estimated marginal means, to get the predicted counts for each age group.


library(emmeans)

emmeans(
nb_glm_offset,
~ age,
type = "response",
offset = mean(insurance$ln_holders) # default ) age response SE df asymp.LCL asymp.UCL <25 27.437 2.489 Inf 22.968 32.776 25-29 24.166 1.802 Inf 20.880 27.969 30-35 21.295 1.525 Inf 18.507 24.505 >35 18.038 0.983 Inf 16.211 20.072 How is this result obtained? It is just the prediction at each value of the covariate, with the offset held at its mean. We can duplicate this result by using the predict method and specifying a data frame with the values of interest.  nd = data.frame( age = levels(insurance$age),
ln_holders = insurance$ln_holders ) %>% mutate(prediction = predict(nb_glm_offset, newdata = ., type = 'response')) %>% group_by(age) %>% summarise(avg_prediction = mean(prediction))  age avg_prediction <25 74.257 25-29 65.403 30-35 57.635 >35 48.820 emmeans The emmeans doesn’t appear to allow one to provide all values of the offset, as adding additional values just applies them to each group and then recycles. In this case, it would just use the first four values of ln_holders for each age group respectively, which is not what we want.  emmeans( nb_glm_offset, ~ age, type = "response", offset = insurance$ln_holders
)

age   response    SE  df asymp.LCL asymp.UCL
<25       40.1  3.64 Inf      33.6      47.9
25-29     47.3  3.53 Inf      40.9      54.8
30-35     38.8  2.78 Inf      33.8      44.7
>35      224.7 12.25 Inf     201.9     250.0

Confidence level used: 0.95
Intervals are back-transformed from the log scale 

ln_holders = insurance$ln_holders ) %>% mutate(prediction = predict(pois_glm_offset, newdata = ., type = 'response')) %>% group_by(age) %>% summarise(avg_prediction = mean(prediction)) %>% mutate(diff = avg_prediction - avg_prediction[1])   # A tibble: 4 x 3 age avg_prediction diff <fct> <dbl> <dbl> 1 <25 73.4 0 2 25-29 63.1 -10.3 3 30-35 55.0 -18.5 4 >35 44.7 -28.8 Stata: over In Stata, with categorical values we can also use the over approach. What do we get in this case?  margins, over(age)  Predictive margins Number of obs = 64 Model VCE : OIM Expression : Predicted number of events, predict() over : age ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | <25 | 14.47052 1.322232 10.94 0.000 11.879 17.06205 25-29 | 26.16218 1.974682 13.25 0.000 22.29188 30.03249 30-35 | 29.67738 2.160145 13.74 0.000 25.44358 33.91119 >35 | 141.0984 7.941774 17.77 0.000 125.5328 156.6639 ------------------------------------------------------------------------------ These are very different from our previous results for Stata, so what’s happening here? R by hand This can be duplicated with the predict function as follows. While similar to the previous approach, here only the observed values of the offset for each group are used. We then make predictions for all values of the data and average them by group.  predictions_over = insurance %>% group_by(age) %>% group_modify( ~ data.frame( prediction = mean( predict(nb_glm_offset, newdata = ., type = 'response') ) ), keep = TRUE )  age prediction <25 14.471 25-29 26.162 30-35 29.677 >35 141.098 The pattern is actually in the opposite direction, which is unexpected, but probably just reflects the fact that we just don’t have much data. However, it’s good to note that these respective approaches would not necessarily tell you the same thing. emmeans I currently don’t know of an equivalence for emmeans in this offset case, and initial searches didn’t turn up much, though it is hard to distinguish specific ‘average predictions’ from many other similar scenarios. I attempted the following, which keeps the values of ln_holders, but it only keeps unique ones, and it’s not reproducing what I would expect, although it implies that it is averaging over the offset values.  rg = ref_grid(nb_glm_offset, cov.keep = c('ln_holders')) # type is ignored for some reason, so we exponentiate after em_over = emmeans(rg, ~age, type = 'response') data.frame(em_over) %>% mutate(count = exp(emmean))  age emmean SE df asymp.LCL asymp.UCL count 1 <25 2.055962 0.09071069 Inf 1.878173 2.233752 7.814354 2 25-29 2.835734 0.07457136 Inf 2.689577 2.981892 17.042911 3 30-35 3.010866 0.07161826 Inf 2.870497 3.151236 20.304984 4 >35 4.574641 0.05450461 Inf 4.467813 4.681468 96.993166 margins The over approach for the margins package is not explicitly supported. The package author states: At present, margins() does not implement the over option. The reason for this is also simple: R already makes data subsetting operations quite simple using simple [ extraction. If, for example, one wanted to calculate marginal effects on subsets of a data frame, those subsets can be passed directly to margins() via the data argument (as in a call to prediction()). It would look something like the following, but we still have the offset problem for this negative binomial class, so I don’t show a result.  insurance %>% group_by(age) %>% group_map(~margins(nb_glm_offset, .), keep = TRUE)  Stata: dydx Categorical Covariate Sometimes people want differences as you move from one level (e.g. the reference level) to the next for some covariate, the ‘average marginal effect’. In Stata this is obtained with the dydx option.  margins, dydx(age)  Conditional marginal effects Number of obs = 64 Model VCE : OIM Expression : Predicted number of events, predict() dy/dx w.r.t. : 2.age 3.age 4.age ------------------------------------------------------------------------------ | Delta-method | dy/dx Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 25-29 | -8.854149 8.375143 -1.06 0.290 -25.26913 7.56083 30-35 | -16.62179 7.959339 -2.09 0.037 -32.2218 -1.021769 >35 | -25.4371 7.297716 -3.49 0.000 -39.74036 -11.13383 ------------------------------------------------------------------------------ Note: dy/dx for factor levels is the discrete change from the base level. In R, we can get this from our initial predictions that used all offset values by just taking the differences in the predicted values.  predictions %>% mutate(dydx = avg_prediction - avg_prediction[1]) age avg_prediction dydx <25 74.257 0.000 25-29 65.403 -8.854 30-35 57.635 -16.622 >35 48.820 -25.437 Continuous predictor Now we’ll consider a continuous covariate. Here we’ll again just focus on a simple example where we rerun the model, but with age as numeric rather than binned4. For comparison we’ll set the numeric age values at roughly the midpoint of the binned categories. We can do this using the at option.  margins, at(age_num = (21, 27, 32, 50))  Negative binomial regression Number of obs = 64 LR chi2(1) = 12.58 Dispersion = mean Prob > chi2 = 0.0004 Log likelihood = -218.76542 Pseudo R2 = 0.0280 ------------------------------------------------------------------------------ claims | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- age_num | -.0079258 .0020657 -3.84 0.000 -.0119746 -.0038771 _cons | -1.521278 .0919801 -16.54 0.000 -1.701556 -1.341001 ln_holders | 1 (offset) -------------+---------------------------------------------------------------- /lnalpha | -3.191873 .3194707 -3.818025 -2.565722 -------------+---------------------------------------------------------------- alpha | .0410948 .0131286 .0219712 .0768636 ------------------------------------------------------------------------------ LR test of alpha=0: chibar2(01) = 69.85 Prob >= chibar2 = 0.000 Adjusted predictions Number of obs = 64 Model VCE : OIM Expression : Predicted number of events, predict() 1._at : age_num = 21 2._at : age_num = 27 3._at : age_num = 32 4._at : age_num = 50 ------------------------------------------------------------------------------ | Delta-method | Margin Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- _at | 1 | 67.50043 3.7343 18.08 0.000 60.18133 74.81952 2 | 64.36558 3.025254 21.28 0.000 58.43619 70.29497 3 | 61.86471 2.5763 24.01 0.000 56.81526 66.91417 4 | 53.63947 2.270244 23.63 0.000 49.18988 58.08907 ------------------------------------------------------------------------------ Again, we can duplicate this with the basic predict function. We just predict at that value of the covariate for all values of the offset, and get the average prediction as we did before.  nb_glm_offset_cont = MASS::glm.nb(claims ~ age_num + offset(ln_holders), data = insurance) predictions = expand.grid( age_num = c(21, 27, 32, 50), ln_holders = insurance$ln_holders
) %>%
mutate(pred =
predict(
nb_glm_offset_cont,
newdata = .,
type = 'response'
)
) %>%
group_by(age_num) %>%
summarise(average_prediction = mean(pred))
age_num average_prediction
21 67.500
27 64.366
32 61.865
50 53.639

We can also get the dydx for the continuous covariate, which is the derivative of the target with respect to the covariate. In linear models, this is just the regression coefficient, but here we have to do things a little differently.


margins, dydx(age_num)

Average marginal effects                        Number of obs     =         64
Model VCE    : OIM

Expression   : Predicted number of events, predict()
dy/dx w.r.t. : age_num

------------------------------------------------------------------------------
|            Delta-method
|      dy/dx   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age_num |  -.4255242   .1050737    -4.05   0.000    -.6314649   -.2195836
------------------------------------------------------------------------------

As noted for the categorical case, this value is the average marginal effect. As the Stata reference describes:

It is not necessarily true that dydx() = 0.5 means that “y increases by 0.5 if x increases by 1”. It is true that “y increases with x at a rate such that, if the rate were constant, y would increase by 0.5 if x increased by 1”

This qualified interpretation may not be of much value in contexts where the rate is not constant, but we can still see what Stata is doing.

R by hand

For dydx, when it comes to continuous covariates, there isn’t an obvious change in the covariate to use (i.e. the dx) to evaluate at each point, as is the case with categorical variables, which can use a reference group. So what we do is use a small arbitrary difference ($$\epsilon$$) for the covariate at its observed values, get the predictions for the values above and below the observed value, and then average those differences in predicted values. For comparison to Stata, I set $$\epsilon$$ to the value used by the margins command. Note that we are only using the observed values for the offset.


h = function(x, epsilon = 1e-5) max(abs(x), 1) * sqrt(epsilon)

age_dx_plus = insurance %>%
select(age_num, ln_holders) %>%
mutate(age_num = age_num + h(age_num))

age_dx_minus = insurance %>%
select(age_num, ln_holders) %>%
mutate(age_num = age_num - h(age_num))

predictions_dydx =
tibble(
dy =
predict(nb_glm_offset_cont, age_dx_plus,  type = 'response') -
predict(nb_glm_offset_cont, age_dx_minus, type = 'response'),
dx   = age_dx_plus$age_num - age_dx_minus$age_num,
dydx = dy/dx
)

# summarise(predictions_dydx, ame = mean(dydx))
ame
-0.42552

So we just get predictions for a small difference in age for each value of age, and average that difference in predictions.

emmeans

The emmeans package is primarily geared toward factor variables, but does have support for numeric variables interacting with factors. However, this isn’t what we’re really looking for here.

margins

We can however use the margins package for this, and it provides the same result as before. For whatever reason, it doesn’t have an issue with the offset if we use the lower level dydx function.


dydx(
insurance,
nb_glm_offset_cont,
variable = 'age_num',
eps = 1e-5
) %>%
summarise(ame = mean(dydx_age_num))
ame
-0.42552

For more on the dydx case for continuous variables in general, see the resources.

Other complications

Obviously models will have more than one covariate, and in the particular case that was brought to my attention, there were also random effects. I may explore more in the future, but the general result should hold in those circumstances. As a quick example5, we can get the same age results for both, by getting the age group predictions with all values of the dataset (not just the offset).


nb_glm_offset_full = MASS::glm.nb(
claims ~  age + group + district + offset(ln_holders),
data = insurance
)

summary(nb_glm_offset_full)

Call:
MASS::glm.nb(formula = claims ~ age + group + district + offset(ln_holders),
data = insurance, init.theta = 449932.3841, link = log)

Deviance Residuals:
Min        1Q    Median        3Q       Max
-2.46551  -0.50795  -0.03196   0.55554   1.94022

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.82174    0.07679 -23.723  < 2e-16 ***
age25-29    -0.19101    0.08286  -2.305 0.021155 *
age30-35    -0.34495    0.08138  -4.239 2.25e-05 ***
age>35      -0.53667    0.06996  -7.671 1.70e-14 ***
group1-1.5l  0.16133    0.05054   3.192 0.001412 **
group1.5-2l  0.39281    0.05500   7.142 9.23e-13 ***
group>2l     0.56341    0.07232   7.791 6.67e-15 ***
district2    0.02587    0.04302   0.601 0.547637
district3    0.03853    0.05052   0.763 0.445686
district4    0.23421    0.06168   3.797 0.000146 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for Negative Binomial(449932.4) family taken to be 1)

Null deviance: 236.212  on 63  degrees of freedom
Residual deviance:  51.416  on 54  degrees of freedom
AIC: 390.74

Number of Fisher Scoring iterations: 1

Theta:  449932
Std. Err.:  4185430
Warning while fitting theta: iteration limit reached

2 x log-likelihood:  -368.745 

nbreg claims i.age i.group i.district, offset(ln_holders) nolog
margins age

Negative binomial regression                    Number of obs     =         64
LR chi2(9)        =      81.37
Dispersion     = mean                           Prob > chi2       =     0.0000
Log likelihood = -184.37077                     Pseudo R2         =     0.1808

------------------------------------------------------------------------------
claims |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |
25-29  |  -.1910093   .0828564    -2.31   0.021    -.3534049   -.0286137
30-35  |  -.3449496   .0813741    -4.24   0.000    -.5044399   -.1854592
>35  |  -.5366602   .0699556    -7.67   0.000    -.6737706   -.3995497
|
group |
1-1.5l  |   .1613446   .0505323     3.19   0.001     .0623031     .260386
>2l  |   .5634109   .0723153     7.79   0.000     .4216755    .7051462
|
district |
2  |   .0258648   .0430156     0.60   0.548    -.0584443    .1101739
3  |   .0385174   .0505114     0.76   0.446    -.0604832     .137518
4  |   .2341969   .0616732     3.80   0.000     .1133197    .3550742
|
_cons |  -1.821741   .0767876   -23.72   0.000    -1.972242    -1.67124
ln_holders |          1  (offset)
-------------+----------------------------------------------------------------
/lnalpha |  -19.19245   546.8434                     -1090.986    1052.601
-------------+----------------------------------------------------------------
alpha |   4.62e-09   2.53e-06                             0           .
------------------------------------------------------------------------------
LR test of alpha=0: chibar2(01) = 7.1e-06              Prob >= chibar2 = 0.499

Predictive margins                              Number of obs     =         64
Model VCE    : OIM

Expression   : Predicted number of events, predict()

------------------------------------------------------------------------------
|            Delta-method
|     Margin   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
age |
<25  |   76.39696   5.069345    15.07   0.000     66.46123    86.33269
25-29  |   63.11343   3.143376    20.08   0.000     56.95253    69.27433
30-35  |   54.10861   2.544524    21.26   0.000     49.12143    59.09579
>35  |   44.66913   .9835578    45.42   0.000      42.7414    46.59687
------------------------------------------------------------------------------

To do this with predict, we make predictions for all observed values as if they were at each level of age. Then we average them for each age group, just like we did before.


predictions_full_model =
map_df(1:4, function(i) mutate(insurance, age = levels(age)[i])) %>%
mutate(
age = factor(age, levels(insurance$age)), # convert back to factor prediction = predict(nb_glm_offset_full, newdata = ., type = 'response') ) %>% group_by(age) %>% summarise(avg_prediction = mean(prediction))  age avg_prediction <25 76.397 25-29 63.113 30-35 54.109 >35 44.669 R Packages To summarize R’s capabilities with Stata-like margins with models using an offset, we have a few options we can note. First, we can get the values using the predict method. Then there are the packages to help with getting these types of predictions. margins explicitly attempts to replicate Stata-like margins for standard and some more complex models, but there doesn’t appear to be documentation on how the offset is dealt with by default. Furthermore, care must be taken if it isn’t an explicitly supported model. As we have also seen, emmeans provides many predictions of the sort discussed here, supports many more models, produces results in a nice format, and has plotting capabilities. However, it’s mostly suited toward factor variables. Beyond those, ggeffects uses predict and emmeans under the hood, so offers a nice way to do the same sorts of things, but with a more viable plot as a result. Other packages and functions are available for specific settings. For example, conditional_effects in the brms package provides predictions and visualization for the bayesian setting. Summary Hopefully this will clarify the discrepancies between R and Stata with models using an offset. Honestly, I pretty much always use the predict function with my specified data values because I know what it’s doing and I can understand the results without hesitation regardless of model or package used. Furthermore, if one knows their data at all, it should be possible to specify covariate values that are meaningful pretty easily. On the other hand, getting predictions at averages can cause conceptual issues with categorical variables in many settings, and getting average effects often also can be hard to interpret (e.g. nonlinear relationships). One thing you don’t get with some of the averaged predictions using the predict function are interval estimates, but this could be obtained via bootstrapping. Otherwise, most predict methods provide the standard error for a prediction with an additional argument (e.g. se.fit = TRUE), so if you getting predictions at key values of the variables it is trivial to get the interval estimate. In general, most R packages are just using predict under the hood, so being familiar with it will likely get you what you need on its own. Resources Reference Stata reference for margins emmeans margins ggeffects Notes Marginal Effects- Rich Williams notes- 1 Marginal Effects- Rich Williams notes- 2 Marginal Effects- Rich Williams notes- 3 Marginal Effects Stata Article by Rich Williams Josh Errickson’s comparisons of Stata, emmeans, and ggeffects UCLA IDRE FAQ (Margins command section) Stata FAQ (based on old mfx command) Appendix Just for giggles, I did an average marginal effect for a GAM, though I find little utility in it for the relationship shown. Confirmed via gratia and margins.  library(mgcv) set.seed(2) dat <- gamSim(1, n = 400, dist = "normal", scale = 2)  Gu & Wahba 4 term additive model  b <- gam(y ~ s(x2), data = dat) visibly::plot_gam(b)  # set change step h = 1e-5 b_dx_plus = dat %>% select(x2) %>% mutate(x2 = x2 + h) b_dx_minus = dat %>% select(x2) %>% mutate(x2 = x2 - h) predictions_dydx = tibble( x2 = dat$x2,
dy =
predict(b, b_dx_plus,  type = 'response') -
predict(b, b_dx_minus, type = 'response'),
dx   = b_dx_plus$x2 - b_dx_minus$x2,
dydx = dy/dx
)

gratia_result  = gratia::derivatives(b, newdata = dat, eps = h)
margins_result = margins::dydx(dat, b, variable = 'x2', eps = h^2)  # note that margins uses the h function specified previously

all.equal(as.numeric(predictions_dydx$dydx), gratia_result$derivative)

[1] "Mean relative difference: 5.596277e-05"

all.equal(as.numeric(predictions_dydx$dydx), as.numeric(margins_result$dydx_x2))

[1] TRUE

summarise(
predictions_dydx,
ame = mean(dydx),
ame_gratia = mean(gratia_result$derivative), ame_margins = mean(margins_result$dydx_x2)
) 

# A tibble: 1 x 3
ame ame_gratia ame_margins
<dbl>      <dbl>       <dbl>
1  1.76       1.76        1.76

1. Not the least of which is that most outside of econometrics don’t call predictions margins, since these days we aren’t adding results to the margin of a hand-written table.↩︎

2. For those in the R world, the i.age tells Stata to treat the age factor as, well, a factor. Stata’s alpha is 1/theta from R’s output.↩︎

3. The margins package does do predictions rather than the marginal effects, but it, like others, is just a wrapper for the predict method, and doesn’t appear to average them, so I don’t demonstrate that.↩︎

4. There is rarely a justifiable reason to discretize age as near as I can tell, and doing so inevitably results in less satisfying and less reliable conclusions.↩︎

5. There is a weird print issue where the Stata output isn’t showing the coefficient for one of the levels of group, but the model is correct and was verified directly using Stata.↩︎

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY-SA 4.0. Source code is available at https://github.com//m-clark/m-clark.github.io, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

Clark (2020, June 16). Michael Clark: Predictions with an offset. Retrieved from https://m-clark.github.io/posts/2020-06-15-predict-with-offset/
@misc{clark2020predictions,
}