--- title: "Causal Mediation Analysis for Stochastic Interventions" author: "[Nima Hejazi](https://nimahejazi.org) and [Iván Díaz](https://www.idiaz.xyz/)" date: "`r Sys.Date()`" output: rmarkdown::html_vignette bibliography: ../inst/REFERENCES.bib vignette: > %\VignetteIndexEntry{Causal mediation analysis for stochastic interventions} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Background We are interested in assessing the population intervention direct effect and the population intervention indirect effect, based on the effect decomposition of the population intervention effect introduced in @diaz2020causal. To proceed, we'll use as our running example a simple data set from an observational study of the relationship between BMI and kids behavior, distributed as part of the [`mma` R package on CRAN](https://CRAN.R-project.org/package=mma). First, let's load the packages we'll be using and set a seed; then, load this data set and take a quick look at it ```{r load_data, message=FALSE, warning=FALSE} # preliminaries library(data.table) library(dplyr) library(tidyr) library(sl3) library(medshift) library(mma) set.seed(429153) # load and examine data data(weight_behavior) dim(weight_behavior) head(weight_behavior) ``` The documentation for the data set describes it as a "database obtained from the Louisiana State University Health Sciences Center, New Orleans, by Dr. Richard Scribner. He explored the relationship between BMI and kids behavior through a survey at children, teachers and parents in Grenada in 2014. This data set includes 691 observations and 15 variables." Unfortunately, the data set contains a few observations with missing values. As these are unrelated to the object of our analysis, we'll simply remove these for the time being. Note that in a real data analysis, we might consider strategies to fully make of the observed data, perhaps by imputing missing values. For now, we simply remove the incomplete observations, resulting in a data set with fewer observations but much the same structure as the original: ```{r remove_na, echo=FALSE, message=FALSE, warning=FALSE} weight_behavior_complete <- weight_behavior %>% drop_na() %>% mutate(sports = as.numeric(sports) - 1) dim(weight_behavior_complete) head(weight_behavior_complete) ``` For the analysis of this observational data set, we focus on the effect of participating in a sports team (`sports`) on the BMI of children (`bmi`), taking several related covariates as mediators (`snack`, `exercises`, `overweigh`) and all other collected covariates as potential confounders. Considering an NPSEM, we separate the observed variables from the data set into their corresponding nodes as follows ```{r npsem, message=FALSE, warning=FALSE} Y <- weight_behavior_complete$bmi A <- weight_behavior_complete$sports Z <- weight_behavior_complete %>% select(snack, exercises, overweigh) W <- weight_behavior_complete %>% select(age, sex, race, numpeople, car, gotosch, tvhours, cmpthours, cellhours, sweat) ``` Finally, in our analysis, we consider an incremental propensity score intervention (IPSI), as first proposed by @kennedy2017nonparametric, wherein the _odds of participating in a sports team_ is modulated by some fixed amount ($0 \leq \delta \leq \infty$) for each individual. Such an intervention may be interpreted as the effect of a school program that motivates children to participate in sports teams. To exemplify our approach, we postulate a motivational intervention that _triples the odds_ of participating in a sports team for each individual: ```{r delta_ipsi, message=FALSE, warning=FALSE} delta_shift_ipsi <- 3 ``` To easily incorporate ensemble machine learning into the estimation procedure, we rely on the facilities provided in the [`sl3` R package](https://tlverse.org/sl3) [@coyle2020sl3]. For a complete guide on using the `sl3` R package, consider consulting https://tlverse.org/sl3, or https://tlverse.org (and https://github.com/tlverse) for the `tlverse` ecosystem, of which `sl3` is a major part. We construct an ensemble learner using a handful of popular machine learning algorithms below ```{r make_sl, message=FALSE, warning=FALSE} # SL learners used for continuous data (the nuisance parameter M) xgb_contin_lrnr <- Lrnr_xgboost$new(nrounds = 50, objective = "reg:linear") enet_contin_lrnr <- Lrnr_glmnet$new(alpha = 0.5, family = "gaussian", nfolds = 3) lasso_contin_lrnr <- Lrnr_glmnet$new(alpha = 1, family = "gaussian", nfolds = 3) fglm_contin_lrnr <- Lrnr_glm_fast$new(family = gaussian()) contin_lrnr_lib <- Stack$new(enet_contin_lrnr, lasso_contin_lrnr, fglm_contin_lrnr, xgb_contin_lrnr) sl_contin_lrnr <- Lrnr_sl$new(learners = contin_lrnr_lib, metalearner = Lrnr_nnls$new()) # SL learners used for binary data (nuisance parameters G and E in this case) xgb_binary_lrnr <- Lrnr_xgboost$new(nrounds = 50, objective = "reg:logistic") enet_binary_lrnr <- Lrnr_glmnet$new(alpha = 0.5, family = "binomial", nfolds = 3) lasso_binary_lrnr <- Lrnr_glmnet$new(alpha = 1, family = "binomial", nfolds = 3) fglm_binary_lrnr <- Lrnr_glm_fast$new(family = binomial()) binary_lrnr_lib <- Stack$new(enet_binary_lrnr, lasso_binary_lrnr, fglm_binary_lrnr, xgb_binary_lrnr) logistic_metalearner <- make_learner(Lrnr_solnp, metalearner_logistic_binomial, loss_loglik_binomial) sl_binary_lrnr <- Lrnr_sl$new(learners = binary_lrnr_lib, metalearner = logistic_metalearner) ``` ## Decomposing the population intervention effect We may decompose the population intervention effect (PIE) in terms of a _population intervention direct effect_ (PIDE) and a _population intervention indirect effect_ (PIIE): \begin{equation*} \overbrace{\mathbb{E}\{Y(A_{\delta}, Z(A_{\delta})) - Y(A_\delta, Z)\}}^{\text{PIIE}} + \overbrace{\mathbb{E}\{Y(A_\delta, Z) - Y(A, Z)\}}^{\text{PIDE}}. \end{equation*} This decomposition of the PIE as the sum of the population intervention direct and indirect effects has an interpretation analogous to the corresponding standard decomposition of the average treatment effect. In the sequel, we will compute each of the components of the direct and indirect effects above using appropriate estimators as follows * For $\mathbb{E}\{Y(A, Z)\}$, the sample mean $\frac{1}{n}\sum_{i=1}^n Y_i$ is sufficient; * for $\mathbb{E}\{Y(A_{\delta}, Z)\}$, an efficient one-step estimator for the effect of a joint intervention altering the exposure mechanism but not the mediation mechanism, as proposed in @diaz2020causal; and, * for $\mathbb{E}\{Y(A_{\delta}, Z_{A_{\delta}})\}$, an efficient one-step estimator for the effect of a joint intervention altering both the exposure and mediation mechanisms, as proposed in @kennedy2017nonparametric and implemented in the [`npcausal` R package](https://github.com/ehkennedy/npcausal). ## Estimating the effect decomposition term As given in @diaz2020causal, the statistical functional identifying the decomposition term that appears in both the PIDE and PIIE $\mathbb{E}\{Y(A_{\delta}, Z)\}$, which corresponds to altering the exposure mechanism while keeping the mediation mechanism fixed, is \begin{equation*} \theta_0(\delta) = \int m_0(a, z, w) g_{0,\delta}(a \mid w) p_0(z, w) d\nu(a, z, w), \end{equation*} for which a one-step estimator is available. The corresponding _efficient influence function_ (EIF) with respect to the nonparametric model $\mathcal{M}$ is $D_{\eta,\delta}(o) = D^Y_{\eta,\delta}(o) + D^A_{\eta,\delta}(o) + D^{Z,W}_{\eta,\delta}(o) - \theta(\delta)$. The one-step estimator may be computed using the EIF estimating equation, making use of cross-fitting [@zheng2011cross; @chernozhukov2018double] to circumvent any need for entropy conditions (i.e., Donsker class restrictions). The resultant estimator is \begin{equation*} \hat{\theta}(\delta) = \frac{1}{n} \sum_{i = 1}^n D_{\hat{\eta}_{j(i)}, \delta}(O_i) = \frac{1}{n} \sum_{i = 1}^n \left\{ D^Y_{\hat{\eta}_{j(i)}, \delta}(O_i) + D^A_{\hat{\eta}_{j(i)}, \delta}(O_i) + D^{Z,W}_{\hat{\eta}_{j(i)}, \delta}(O_i) \right\}, \end{equation*} which is implemented in the `medshift` R package. We make use of that implementation to estimate $\mathbb{E}\{Y(A_{\delta}, Z)\}$ via its one-step estimator $\hat{\theta}(\delta)$ below ```{r efficient_est, message=FALSE, warning=FALSE} # let's compute the parameter where A (but not Z) are shifted theta_eff <- medshift(W = W, A = A, Z = Z, Y = Y, delta = delta_shift_ipsi, g_learners = sl_binary_lrnr, e_learners = sl_binary_lrnr, m_learners = sl_contin_lrnr, phi_learners = Lrnr_hal9001$new(), estimator = "onestep", estimator_args = list(cv_folds = 3)) summary(theta_eff) ``` ## Estimating the direct effect Recall that, based on the decomposition outlined previously, the population intervention direct effect may be denoted $\beta_{\text{PIDE}}(\delta) = \theta_0(\delta) - \mathbb{E}Y$. Thus, an estimator of the PIDE, $\hat{\beta}_{\text{PIDE}}(\delta)$ may be expressed as a composition of estimators of its constituent parameters: \begin{equation*} \hat{\beta}_{\text{PIDE}}({\delta}) = \hat{\theta}(\delta) - \frac{1}{n} \sum_{i = 1}^n Y_i. \end{equation*} Based on the above, we may construct an estimator of the PIDE using quantities already computed. The convenience function below applies the simple delta method required in the case of a linear contrast between the two constituent parameters: ```{r linear_contrast_delta, message=FALSE, warning=FALSE} # convenience function to compute inference via delta method: EY1 - EY0 linear_contrast <- function(params, eifs, ci_level = 0.95) { # bounds for confidence interval ci_norm_bounds <- c(-1, 1) * abs(stats::qnorm(p = (1 - ci_level) / 2)) param_est <- params[[1]] - params[[2]] eif <- eifs[[1]] - eifs[[2]] se_eif <- sqrt(var(eif) / length(eif)) param_ci <- param_est + ci_norm_bounds * se_eif # parameter and inference out <- c(param_ci[1], param_est, param_ci[2]) names(out) <- c("lwr_ci", "param_est", "upr_ci") return(out) } ``` With the above convenience function in hand, we'll construct or extract the necessary components from existing objects and simply apply the function: ```{r comp_de_binary, message=FALSE, warning=FALSE} # parameter estimates and EIFs for components of direct effect EY <- mean(Y) eif_EY <- Y - EY params_de <- list(theta_eff$theta, EY) eifs_de <- list(theta_eff$eif, eif_EY) # direct effect = EY - estimated quantity de_est <- linear_contrast(params_de, eifs_de) de_est ``` As given above, we have for our estimate of the direct effect $\hat{\beta}_{\text{PIDE}}({\delta}) =$ `r round(de_est[2], 3)`. ## References