Marginal Structural Model

This is a demonstration of a simple marginal structural model for estimation of so-called ‘causal’ effects using inverse probability weighting.

Example data is from, and comparison made to, the ipw package. See more here.

Data Setup

This example is from the helpfile at ?ipwpoint.

library(tidyverse)
library(ipw)

set.seed(16)

n = 1000
simdat = data.frame(l = rnorm(n, 10, 5))
a_lin  = simdat$l - 10
pa = plogis(a_lin)

simdat = simdat %>% 
  mutate(
    a = rbinom(n, 1, prob = pa),
    y = 10 * a + 0.5 * l + rnorm(n, -10, 5)
  )

ipw_result = ipwpoint(
  exposure = a,
  family   = "binomial",
  link     = "logit",
  numerator   = ~ 1,
  denominator = ~ l,
  data = simdat
)

summary(ipw_result$ipw.weights)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.4810  0.5127  0.5285  0.9095  0.6318 74.6994 
ipwplot(ipw_result$ipw.weights)

We create the weights as follows using the probabilities from a logistic regression.

ps_num = fitted(glm(a ~ 1, data = simdat, family = 'binomial'))
ps_num[simdat$a == 0] = 1 - ps_num[simdat$a == 0]

ps_den = fitted(glm(a ~ l, data = simdat, family = 'binomial'))
ps_den[simdat$a == 0] = 1 - ps_den[simdat$a == 0]

wts = ps_num / ps_den

Compare the weights.

rbind(summary(wts), summary(ipw_result$ipw.weights))
      Min.   1st Qu.    Median      Mean  3rd Qu.    Max.
[1,] 0.481 0.5127181 0.5284768 0.9094652 0.631794 74.6994
[2,] 0.481 0.5127181 0.5284768 0.9094652 0.631794 74.6994

Add inverse probability weights to the data if desired.

simdat = simdat %>% 
  mutate(sw = ipw_result$ipw.weights)

Function

Create the likelihood function for using the weights.

msm_ll <- function(
  par,             # parameters to be estimated; first is taken to be sigma
  X,               # model matrix
  y,               # target variable
  wts              # estimated weights
) {
  beta  = par[-1]
  lp    = X %*% beta
  sigma = exp(par[1])  # exponentiated value to stay positive
  
  ll = dnorm(y, mean = lp, sd = sigma, log = TRUE)  
  -sum(ll * wts)   # weighted likelihood
  
  # same as
  # ll = dnorm(y, mean = lp, sd = sigma)^wts
  # -sum(log(ll))
}

Estimation

We want to estimate the marginal structural model for the causal effect of a on y corrected for confounding by l, using inverse probability weighting with robust standard error from the survey package. Create the matrices for estimation, estimate the model, and extract results.

X = cbind(1, simdat$a)
y = simdat$y

fit = optim(
  par = c(sigma = 0, intercept = 0, b = 0),
  fn  = msm_ll,
  X   = X,
  y   = y,
  wts = wts,
  hessian = TRUE,
  method  = 'BFGS',
  control = list(abstol = 1e-12)
)

dispersion = exp(fit$par[1])^2
beta = fit$par[-1]

Now we compute the standard errors. The following uses the survey package raw version to get the appropriate standard errors, which the ipw approach uses.

glm_basic = glm(y ~ a, data = simdat, weights = wts)     # to get unscaled cov
res = resid(glm_basic, type = 'working')                 # residuals
glm_vcov_unsc = summary(glm_basic)$cov.unscaled          # weighted vcov unscaled by dispersion solve(crossprod(qr(X)))
estfun = X * res * wts                  
x = estfun %*% glm_vcov_unsc 

Comparison

library("survey")

fit_msm = svyglm(
  y ~ a,
  design = svydesign(~ 1, weights = ~ sw, data = simdat)
)

summary(fit_msm)

Call:
svyglm(formula = y ~ a, design = svydesign(~1, weights = ~sw, 
    data = simdat))

Survey design:
svydesign(~1, weights = ~sw, data = simdat)

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   -4.375      1.142  -3.832 0.000135 ***
a             10.647      1.190   8.948  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 29.58889)

Number of Fisher Scoring iterations: 2

Now get the standard errors.

se = sqrt(diag(crossprod(x) * n/(n-1)))                  # a robust standard error
se_robust = sqrt(diag(sandwich::sandwich(glm_basic)))    # an easier way to get it
se_msm    = sqrt(diag(vcov(fit_msm)))                        # extract from msm model

Compare standard errors.

tibble(se, se_robust, se_msm)
# A tibble: 2 x 3
     se se_robust se_msm
  <dbl>     <dbl>  <dbl>
1  1.14      1.14   1.14
2  1.19      1.19   1.19

Inspect the general fit and compare with the other.

(#tab:msm-comparison- show)msm_ll
Estimate init_se se_robust t p dispersion
-4.378 0.247 1.141 -3.834 0 29.563
10.649 0.361 1.189 8.950 0 29.563
(#tab:msm-comparison- show)svyglm
term estimate std.error statistic p.value
(Intercept) -4.375 1.142 -3.832 0
a 10.647 1.190 8.948 0