Stochastic treatment regimes constitute a flexible framework for
evaluating the effects of continuous-valued exposures/treatments.
Modified treatment policies, one such technique within this
framework, examine the effects attributable to shifting the observed
(“natural”) value of a treatment, usually up or down by some scalar
δ. The txshift
package implements algorithms for computing one-step or targeted minimum
loss-based (TML) estimates of the counterfactual means induced by
additive modified treatment policies (MTPs), defined by a shifting
function δ(A, W). For a
technical presentation, the interested reader is invited to consult
Dı́az and van der Laan (2018) or the
earlier work of Dı́az and van der Laan
(2012) and Haneuse and Rotnitzky
(2013). For background on Targeted Learning, consider consulting
van der Laan and Rose (2011), van der Laan and Rose (2018), and van der Laan et al. (2022).
To start, let’s load the packages we’ll need and set a seed for simulation:
## haldensify v0.2.7: Highly Adaptive Lasso Conditional Density Estimation
## txshift v0.3.9: Efficient Estimation of the Causal Effects of Stochastic
## Interventions
We’ll consider n observed units O1, …, On, where each random variable O = (W, A, Y) corresponds to the data available on a single unit. Within O, W denotes baseline covariates (e.g., age, biological sex, BMI), A ∈ ℝ a continuous-valued exposure (e.g., dosage of nutritional supplements taken), and Y an outcome of interest (e.g., disease status). To minimize unjustifiable assumptions, we let O ∼ 𝒫 ∈ ℳ, where 𝒫 is simply any distribution within the nonparametric statistical model ℳ. To formalize the definition of stochastic interventions and their corresponding causal effects, we consider a nonparametric structural equation model (NPSEM), introduced by Pearl (2000), to define how the system changes under interventions of interest: We denote the observed data structure O = (W, A, Y)
Assuming that the distribution of A conditional on W = w has support in the interval (l(w), u(w)) – for convenience, we assume that the minimum natural value of treatment A for an individual with covariates W = w is l(w), while, similarly, the maximum is u(w) – a simple MTP based on a shift δ, is where 0 ≤ δ is an arbitrary pre-specified value that defines the degree to which the observed value A is to be shifted, where possible.
In case-cohort studies, it is common practice to make use of
outcome-dependent two-phase sampling designs, which allow for expensive
measurements made on the exposure (e.g., genomic sequencing of immune
markers) to be avoided. As a complication, such sampling schemes alter
the observed data structure from the simpler O = (W, A, Y)
to O = (W, ΔA, Y, Δ),
where the sampling indicator Δ
may itself be a function of the variables {W, Y}. In this revised
data structure, the value of A
is only observed for units in the two-phase sample, for whom Δ = 1. Hejazi
et al. (2020) provide a detailed investigation of the
methodological details of efficient estimation under such designs in the
context of vaccine efficacy trials; their work was used in the analysis
of immune correlates of protection for HIV-1 (Hejazi et al. 2020) and COVID-19 (Gilbert et al. 2021). Of course, one may also
account for loss to follow-up (i.e., censoring), which the
txshift
package supports (through the C_cens
argument of the eponymous txshift
function), though we
avoid this complication in our subsequent examples in the interest of
clarity of exposition. Corrections for both censoring and two-phase
sampling make use of inverse probability of censoring weighting (IPCW),
leading to IPCW-augmented one-step and TML estimators.
# parameters for simulation example
n_obs <- 200 # number of observations
# baseline covariate -- simple, binary
W <- rbinom(n_obs, 1, 0.5)
# create treatment based on baseline W
A <- rnorm(n_obs, mean = 2 * W, sd = 0.5)
# create outcome as a linear function of A, W + white noise
Y <- A + W + rnorm(n_obs, mean = 0, sd = 1)
# two-phase sampling based on covariates
Delta_samp <- rbinom(n_obs, 1, plogis(W))
# treatment shift parameter
delta <- 0.5
The simplest way to compute an efficient estimator for an additive
MTP is to fit each of the nuisance parameters internally. This procedure
can be sped up by using generalized linear models (GLMs) to fit the
outcome regression Qn. The
txshift()
function provides a simple interface.
est_shift <- txshift(
W = W, A = A, Y = Y, delta = delta,
g_exp_fit_args = list(
fit_type = "hal", n_bins = 5,
grid_type = "equal_mass",
lambda_seq = exp(seq(-1, -10, length = 100))
),
Q_fit_args = list(
fit_type = "glm",
glm_formula = "Y ~ .^2"
)
)
est_shift
## Counterfactual Mean of Shifted Treatment
## Intervention: Treatment + 0.5
## txshift Estimator: tmle
## Estimate: 1.8623
## Std. Error: 0.1542
## 95% CI: [1.56, 2.1646]
sl3
To easily incorporate ensemble machine learning into the estimation
procedure, txshift
integrates with the sl3
R package (Coyle, Hejazi, Malenica, et al. 2022). For a
complete guide on using the sl3
R package, consider
consulting the
chapter on Super Learning in van der Laan et
al. (2022).
library(sl3)
# SL learners to be used for most fits (e.g., IPCW, outcome regression)
mean_learner <- Lrnr_mean$new()
glm_learner <- Lrnr_glm$new()
rf_learner <- Lrnr_ranger$new()
Q_lib <- Stack$new(mean_learner, glm_learner, rf_learner)
sl_learner <- Lrnr_sl$new(learners = Q_lib, metalearner = Lrnr_nnls$new())
# SL learners for fitting the generalized propensity score fit
hose_learner <- make_learner(Lrnr_density_semiparametric,
mean_learner = glm_learner
)
hese_learner <- make_learner(Lrnr_density_semiparametric,
mean_learner = rf_learner,
var_learner = glm_learner
)
g_lib <- Stack$new(hose_learner, hese_learner)
sl_learner_density <- Lrnr_sl$new(
learners = g_lib,
metalearner = Lrnr_solnp_density$new()
)
Using the framework provided by the sl3
package, the
nuisance functions required for our efficient estimators may be fit with
ensemble machine learning. The Super Learner algorithm (van der Laan, Polley, and Hubbard 2007)
implemented in sl3
uses the asymptotic optimality of V-fold
cross-validation (Dudoit and van der Laan 2005;
van der Laan, Dudoit, and Keles 2004; van der Vaart, Dudoit, and van der
Laan 2006) to select an optimal prediction functions from a
library or to construct an optimal combination of prediction
functions.
In case-cohort studies in which two-phase sampling is used, the data
structure takes the from O = (W, ΔA, Y, Δ)
as previously discussed. Under such sampling, the txshift()
function may still be used to estimate the causal effect of an additive
MTP – only a few additional arguments need to be specified:
est_shift_ipcw <- txshift(
W = W, A = A, Y = Y, delta = delta,
C_samp = Delta_samp, V = c("W", "Y"),
samp_fit_args = list(fit_type = "glm"),
g_exp_fit_args = list(
fit_type = "hal", n_bins = 5,
grid_type = "equal_mass",
lambda_seq = exp(seq(-1, -10, length = 100))
),
Q_fit_args = list(
fit_type = "glm",
glm_formula = "Y ~ .^2"
),
eif_reg_type = "glm"
)
est_shift_ipcw
## Counterfactual Mean of Shifted Treatment
## Intervention: Treatment + 0.5
## txshift Estimator: tmle
## Estimate: 1.958
## Std. Error: 0.1396
## 95% CI: [1.6844, 2.2316]
Note that we specify a few additional arguments in the call to
txshift()
, including C_samp
, the indicator of
inclusion in the two-phase sample; V
, the set of other
variables that may affect the sampling decision (in this case, both the
baseline covariates and the outcome); samp_fit_args
, which
indicates how the sampling mechanism ought to be estimated; and
eif_reg_type
, which indicates how a particular
reduced-dimension nuisance regression ought to be estimated (see Rose and van der Laan (2011) and Hejazi et al. (2020) for details). This last
argument only has options for using a GLM or the highly adaptive lasso
(HAL), a nonparametric regression estimator, using the hal9001
package [Coyle, Hejazi, Phillips, et al.
(2022); hejazi2020hal9001-joss]. In practice, we recommend
leaving this argument to the default and fitting this nuisance function
with HAL; however, this is significantly more costly
computationally.
The efficient estimators implemented in txshift
are
asymptotically linear; thus, the estimator ψn converges to
the true parameter value ψ0: ψn − ψ0 = (Pn − P0) ⋅ D(Q̄n⋆, gn) + R(P̂⋆, P0),
which yields $$\psi_n - \psi_0 = (P_n - P_0)
\cdot D(P_0) + o_P \left( \frac{1}{\sqrt{n}}
\right),$$ provided the following conditions,
By the central limit theorem, the estimators then have a Gaussian limiting distribution, $$\sqrt{n}(\psi_n - \psi) \to N(0, V(D(P_0))),$$ where V(D(P0)) is the variance of the efficient influence function (or canonical gradient).
The above implies that ψn is a $\sqrt{n}$-consistent estimator of ψ, that it is asymptotically normal (as given above), and that it is locally efficient. This allows us to build Wald-type confidence intervals in a straightforward manner:
$$\psi_n \pm z_{\alpha} \cdot \frac{\sigma_n}{\sqrt{n}},$$ where σn2 is an estimator of V(D(P0)). The estimator σn2 may be obtained using the bootstrap or computed directly via the following
$$\sigma_n^2 = \frac{1}{n} \sum_{i = 1}^{n} D^2(\bar{Q}_n^{\star}, g_n)(O_i).$$
Such confidence intervals may easily be created with the
confint
method:
## lwr_ci est upr_ci
## 1.559997 1.862286 2.164576
In some special cases it may be useful for the experienced user to
compute the treatment mechanism, censoring mechanism, outcome
regression, and sampling mechanism fits separately (i.e., outside of the
txshift
wrapper function), instead applying the wrapper
only to construct an efficient one-step or TML estimator. In such cases,
the optional arguments ending in _ext
.
# compute censoring mechanism and produce IPC weights externally
pi_mech <- plogis(W)
ipcw_out <- pi_mech
# compute treatment mechanism (propensity score) externally
## first, produce the down-shifted treatment data
gn_downshift <- dnorm(A - delta, mean = tx_mult * W, sd = 1)
## next, initialize and produce the up-shifted treatment data
gn_upshift <- dnorm(A + delta, mean = tx_mult * W, sd = 1)
## now, initialize and produce the up-up-shifted (2 * delta) treatment data
gn_upupshift <- dnorm(A + 2 * delta, mean = tx_mult * W, sd = 1)
## then, initialize and produce the un-shifted treatment data
gn_noshift <- dnorm(A, mean = tx_mult * W, sd = 1)
## finally, put it all together into an object like what's produced internally
gn_out <- as.data.table(cbind(
gn_downshift, gn_noshift, gn_upshift,
gn_upupshift
))[C == 1, ]
setnames(gn_out, c("downshift", "noshift", "upshift", "upupshift"))
# compute outcome regression externally
# NOTE: transform Y to lie in the unit interval and bound predictions such that
# no values fall near the bounds of the interval
Qn_noshift <- (W + A - min(Y)) / diff(range(Y))
Qn_upshift <- (W + A + delta - min(Y)) / diff(range(Y))
Qn_noshift[Qn_noshift < 0] <- 0.025
Qn_noshift[Qn_noshift > 1] <- 0.975
Qn_upshift[Qn_upshift < 0] <- 0.025
Qn_upshift[Qn_upshift > 1] <- 0.975
Qn_out <- as.data.table(cbind(Qn_noshift, Qn_upshift))[C == 1, ]
setnames(Qn_out, c("noshift", "upshift"))
# construct efficient estimator by applying wrapper function
est_shift_spec <- txshift(
W = W, A = A, Y = Y, delta = delta,
samp_fit_args = NULL,
samp_fit_ext = ipcw_out,
g_exp_fit_args = list(fit_type = "external"),
Q_fit_args = list(fit_type = "external"),
gn_exp_fit_ext = gn_out,
Qn_fit_ext = Qn_out
)
R
: A Causal Data
Science Handbook. CRC Press. https://tlverse.org/tlverse-handbook/.