Title: | Highly Adaptive Lasso Conditional Density Estimation |
---|---|
Description: | An algorithm for flexible conditional density estimation based on application of pooled hazard regression to an artificial repeated measures dataset constructed by discretizing the support of the outcome variable. To facilitate flexible estimation of the conditional density, the highly adaptive lasso, a non-parametric regression function shown to estimate cadlag (RCLL) functions at a suitably fast convergence rate, is used. The use of pooled hazards regression for conditional density estimation as implemented here was first described for by Díaz and van der Laan (2011) <doi:10.2202/1557-4679.1356>. Building on the conditional density estimation utilities, non-parametric inverse probability weighted (IPW) estimators of the causal effects of additive modified treatment policies are implemented, using conditional density estimation to estimate the generalized propensity score. Non-parametric IPW estimators based on this can be coupled with sieve estimation (undersmoothing) of the generalized propensity score to attain the semi-parametric efficiency bound (per Hejazi, Benkeser, Díaz, and van der Laan <doi:10.48550/arXiv.2205.05777>). |
Authors: | Nima Hejazi [aut, cre, cph] , David Benkeser [aut] , Mark van der Laan [aut, ths] , Rachael Phillips [ctb] |
Maintainer: | Nima Hejazi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.2.7 |
Built: | 2024-11-21 02:50:01 UTC |
Source: | https://github.com/nhejazi/haldensify |
Confidence Intervals for IPW Estimates of the Causal Effects of Stochatic Shift Interventions
## S3 method for class 'ipw_haldensify' confint(object, parm = seq_len(object$psi), level = 0.95, ...)
## S3 method for class 'ipw_haldensify' confint(object, parm = seq_len(object$psi), level = 0.95, ...)
object |
An object of class |
parm |
A |
level |
A |
... |
Other arguments. Not currently used. |
Compute confidence intervals for estimates produced by
ipw_shift
.
A named numeric
vector containing the parameter estimate from
a ipw_haldensify
object, alongside lower/upper Wald-style confidence
intervals at a specified coverage level.
# simulate data n_obs <- 50 W1 <- rbinom(n_obs, 1, 0.6) W2 <- rbinom(n_obs, 1, 0.2) W3 <- rpois(n_obs, 3) A <- rpois(n_obs, 3 * W1 - W2 + 2 * W1 * W2 + 4) Y <- rbinom(n_obs, 1, plogis(A + W1 + W2 - W3 - W1 * W3)) # fit the IPW estimator est_ipw <- ipw_shift( W = cbind(W1, W2, W3), A = A, Y = Y, delta = 0.5, cv_folds = 3L, n_bins = 5L, bin_type = "equal_range", lambda_seq = exp(seq(-1, -10, length = 100L)), # arguments passed to hal9001::fit_hal() max_degree = 2, smoothness_orders = 0, reduce_basis = 1 / sqrt(n_obs) ) confint(est_ipw)
# simulate data n_obs <- 50 W1 <- rbinom(n_obs, 1, 0.6) W2 <- rbinom(n_obs, 1, 0.2) W3 <- rpois(n_obs, 3) A <- rpois(n_obs, 3 * W1 - W2 + 2 * W1 * W2 + 4) Y <- rbinom(n_obs, 1, plogis(A + W1 + W2 - W3 - W1 * W3)) # fit the IPW estimator est_ipw <- ipw_shift( W = cbind(W1, W2, W3), A = A, Y = Y, delta = 0.5, cv_folds = 3L, n_bins = 5L, bin_type = "equal_range", lambda_seq = exp(seq(-1, -10, length = 100L)), # arguments passed to hal9001::fit_hal() max_degree = 2, smoothness_orders = 0, reduce_basis = 1 / sqrt(n_obs) ) confint(est_ipw)
HAL Conditional Density Estimation in a Cross-validation Fold
cv_haldensify( fold, long_data, wts = rep(1, nrow(long_data)), lambda_seq = exp(seq(-1, -13, length = 1000L)), smoothness_orders = 0L, ... )
cv_haldensify( fold, long_data, wts = rep(1, nrow(long_data)), lambda_seq = exp(seq(-1, -13, length = 1000L)), smoothness_orders = 0L, ... )
fold |
Object specifying cross-validation folds as generated by a call
to |
long_data |
A |
wts |
A |
lambda_seq |
A |
smoothness_orders |
A |
... |
Additional (optional) arguments of |
Estimates the conditional density of A|W for a subset of the full
set of observations based on the inputted structure of the cross-validation
folds. This is a helper function intended to be used to select the optimal
value of the penalization parameter for the highly adaptive lasso estimates
of the conditional hazard (via cross_validate
). The
A list
, containing density predictions, observations IDs,
observation-level weights, and cross-validation indices for conditional
density estimation on a single fold of the overall data.
Fit Conditional Density Estimation over a Sequence of HAL Models
fit_haldensify( A, W, wts = rep(1, length(A)), grid_type = "equal_range", n_bins = round(c(0.5, 1, 1.5, 2) * sqrt(length(A))), cv_folds = 5L, lambda_seq = exp(seq(-1, -13, length = 1000L)), smoothness_orders = 0L, ... )
fit_haldensify( A, W, wts = rep(1, length(A)), grid_type = "equal_range", n_bins = round(c(0.5, 1, 1.5, 2) * sqrt(length(A))), cv_folds = 5L, lambda_seq = exp(seq(-1, -13, length = 1000L)), smoothness_orders = 0L, ... )
A |
The |
W |
A |
wts |
A |
grid_type |
A |
n_bins |
This |
cv_folds |
A |
lambda_seq |
A |
smoothness_orders |
A |
... |
Additional (optional) arguments of |
Estimation of the conditional density of A|W via a cross-validated highly adaptive lasso, used to estimate the conditional hazard of failure in a given bin over the support of A.
A list
, containing density predictions for the sequence of
fitted HAL models; the index and value of the L1 regularization parameter
minimizing the density loss; and the sequence of empirical risks for the
sequence of fitted HAL models.
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5) set.seed(11249) n_train <- 50 w <- runif(n_train, -4, 4) a <- rnorm(n_train, w, 0.5) # fit cross-validated HAL-based density estimator of A|W haldensify_cvfit <- fit_haldensify( A = a, W = w, n_bins = 10L, lambda_seq = exp(seq(-1, -10, length = 100)), # the following arguments are passed to hal9001::fit_hal() max_degree = 3, reduce_basis = 1 / sqrt(length(a)) )
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5) set.seed(11249) n_train <- 50 w <- runif(n_train, -4, 4) a <- rnorm(n_train, w, 0.5) # fit cross-validated HAL-based density estimator of A|W haldensify_cvfit <- fit_haldensify( A = a, W = w, n_bins = 10L, lambda_seq = exp(seq(-1, -10, length = 100)), # the following arguments are passed to hal9001::fit_hal() max_degree = 3, reduce_basis = 1 / sqrt(length(a)) )
Generate Augmented Repeated Measures Data for Pooled Hazards Regression
format_long_hazards( A, W, wts = rep(1, length(A)), grid_type = c("equal_range", "equal_mass"), n_bins = NULL, breaks = NULL )
format_long_hazards( A, W, wts = rep(1, length(A)), grid_type = c("equal_range", "equal_mass"), n_bins = NULL, breaks = NULL )
A |
The |
W |
A |
wts |
A |
grid_type |
A |
n_bins |
Only used if |
breaks |
A |
Generates an augmented (long format, or repeated measures) dataset that includes multiple records for each observation, a single record for each discretized bin up to and including the bin in which a given observed value of A falls. Such bins are derived from selecting break points over the support of A. This repeated measures dataset is suitable for estimating the hazard of failing in a particular bin over A using a highly adaptive lasso (or other) classification model.
A list
containing the break points used in dividing the
support of A
into discrete bins, the length of each bin, and the
reformatted data. The reformatted data is a data.table
of
repeated measures data, with an indicator for which bin an observation
fails in, the bin ID, observation ID, values of W
for each given
observation, and observation-level weights.
Cross-validated HAL Conditional Density Estimation
haldensify( A, W, wts = rep(1, length(A)), grid_type = "equal_range", n_bins = round(c(0.5, 1, 1.5, 2) * sqrt(length(A))), cv_folds = 5L, lambda_seq = exp(seq(-1, -13, length = 1000L)), smoothness_orders = 0L, hal_basis_list = NULL, ... )
haldensify( A, W, wts = rep(1, length(A)), grid_type = "equal_range", n_bins = round(c(0.5, 1, 1.5, 2) * sqrt(length(A))), cv_folds = 5L, lambda_seq = exp(seq(-1, -13, length = 1000L)), smoothness_orders = 0L, hal_basis_list = NULL, ... )
A |
The |
W |
A |
wts |
A |
grid_type |
A |
n_bins |
This |
cv_folds |
A |
lambda_seq |
A |
smoothness_orders |
A |
hal_basis_list |
A |
... |
Additional (optional) arguments of |
Estimation of the conditional density A|W through using the highly adaptive lasso to estimate the conditional hazard of failure in a given bin over the support of A. Cross-validation is used to select the optimal value of the penalization parameters, based on minimization of the weighted log-likelihood loss for a density.
Object of class haldensify
, containing a fitted
hal9001
object; a vector of break points used in binning A
over its support W
; sizes of the bins used in each fit; the tuning
parameters selected by cross-validation; the full sequence (in lambda) of
HAL models for the CV-selected number of bins and binning strategy; and
the range of A
.
Parallel evaluation of the cross-validation procedure to select tuning
parameters for density estimation may be invoked via the framework exposed
in the future ecosystem. Specifically, set plan
for future_mapply
to be used internally.
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5) set.seed(11249) n_train <- 50 w <- runif(n_train, -4, 4) a <- rnorm(n_train, w, 0.5) # learn relationship A|W using HAL-based density estimation procedure haldensify_fit <- haldensify( A = a, W = w, n_bins = 10L, lambda_seq = exp(seq(-1, -10, length = 100)), # the following arguments are passed to hal9001::fit_hal() max_degree = 3, reduce_basis = 1 / sqrt(length(a)) )
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5) set.seed(11249) n_train <- 50 w <- runif(n_train, -4, 4) a <- rnorm(n_train, w, 0.5) # learn relationship A|W using HAL-based density estimation procedure haldensify_fit <- haldensify( A = a, W = w, n_bins = 10L, lambda_seq = exp(seq(-1, -10, length = 100)), # the following arguments are passed to hal9001::fit_hal() max_degree = 3, reduce_basis = 1 / sqrt(length(a)) )
IPW Estimator of the Causal Effects of Additive Modified Treatment Policies
ipw_shift( W, A, Y, delta = 0, n_bins = make_bins(A, "hist"), cv_folds = 10L, lambda_seq, ..., bin_type = c("equal_range", "equal_mass"), selector_type = c("dcar", "plateau", "gcv", "all") )
ipw_shift( W, A, Y, delta = 0, n_bins = make_bins(A, "hist"), cv_folds = 10L, lambda_seq, ..., bin_type = c("equal_range", "equal_mass"), selector_type = c("dcar", "plateau", "gcv", "all") )
W |
A |
A |
A |
Y |
A |
delta |
A |
n_bins |
A |
cv_folds |
A |
lambda_seq |
A |
... |
Additional arguments for model fitting to be passed directly to
|
bin_type |
A |
selector_type |
A |
# simulate data set.seed(11249) n_obs <- 50 W1 <- rbinom(n_obs, 1, 0.6) W2 <- rbinom(n_obs, 1, 0.2) W3 <- rpois(n_obs, 3) A <- rpois(n_obs, 3 * W1 - W2 + 2 * W1 * W2 + 4) Y <- rbinom(n_obs, 1, plogis(A + W1 + W2 - W3 - W1 * W3)) # fit the IPW estimator est_ipw <- ipw_shift( W = cbind(W1, W2, W3), A = A, Y = Y, delta = 0.5, cv_folds = 3L, n_bins = 4L, bin_type = "equal_range", lambda_seq = exp(seq(-1, -10, length = 100L)), # arguments passed to hal9001::fit_hal() max_degree = 1L, smoothness_orders = 0, reduce_basis = 1 / sqrt(n_obs) )
# simulate data set.seed(11249) n_obs <- 50 W1 <- rbinom(n_obs, 1, 0.6) W2 <- rbinom(n_obs, 1, 0.2) W3 <- rpois(n_obs, 3) A <- rpois(n_obs, 3 * W1 - W2 + 2 * W1 * W2 + 4) Y <- rbinom(n_obs, 1, plogis(A + W1 + W2 - W3 - W1 * W3)) # fit the IPW estimator est_ipw <- ipw_shift( W = cbind(W1, W2, W3), A = A, Y = Y, delta = 0.5, cv_folds = 3L, n_bins = 4L, bin_type = "equal_range", lambda_seq = exp(seq(-1, -10, length = 100L)), # arguments passed to hal9001::fit_hal() max_degree = 1L, smoothness_orders = 0, reduce_basis = 1 / sqrt(n_obs) )
Map Predicted Hazard to Predicted Density for a Single Observation
map_hazard_to_density(hazard_pred_single_obs)
map_hazard_to_density(hazard_pred_single_obs)
hazard_pred_single_obs |
A |
For a single observation, map a predicted hazard of failure (as an occurrence in a particular bin, under a given partitioning of the support) to a density.
A matrix
composed of a single row and a number of columns
specified by the grid of penalization parameters used in fitting of the
highly adaptive lasso. This is the predicted conditional density for a
given observation, re-mapped from the hazard scale.
Plot Method for HAL Conditional Density Estimates
## S3 method for class 'haldensify' plot(x, ..., type = c("risk", "density"))
## S3 method for class 'haldensify' plot(x, ..., type = c("risk", "density"))
x |
Object of class |
... |
Additional arguments to be passed |
type |
A |
Object of class ggplot
containing a plot of the desired
type
.
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5) n_train <- 50 w <- runif(n_train, -4, 4) a <- rnorm(n_train, w, 0.5) # learn relationship A|W using HAL-based density estimation procedure haldensify_fit <- haldensify( A = a, W = w, n_bins = 3, lambda_seq = exp(seq(-1, -10, length = 50)), # the following arguments are passed to hal9001::fit_hal() max_degree = 2L, smoothness_orders = 0L, reduce_basis = 0.1 ) plot(haldensify_fit)
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5) n_train <- 50 w <- runif(n_train, -4, 4) a <- rnorm(n_train, w, 0.5) # learn relationship A|W using HAL-based density estimation procedure haldensify_fit <- haldensify( A = a, W = w, n_bins = 3, lambda_seq = exp(seq(-1, -10, length = 50)), # the following arguments are passed to hal9001::fit_hal() max_degree = 2L, smoothness_orders = 0L, reduce_basis = 0.1 ) plot(haldensify_fit)
Prediction Method for HAL Conditional Density Estimation
## S3 method for class 'haldensify' predict( object, ..., new_A, new_W, trim = TRUE, trim_min = NULL, lambda_select = c("cv", "undersmooth", "all") )
## S3 method for class 'haldensify' predict( object, ..., new_A, new_W, trim = TRUE, trim_min = NULL, lambda_select = c("cv", "undersmooth", "all") )
object |
An object of class |
... |
Additional arguments passed to |
new_A |
The |
new_W |
A |
trim |
A |
trim_min |
A |
lambda_select |
A |
Method for computing and extracting predictions of the conditional
density estimates based on the highly adaptive lasso estimator, returned as
an S3 object of class haldensify
from haldensify
.
A numeric
vector of predicted conditional density values from
a fitted haldensify
object.
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5) n_train <- 50 w <- runif(n_train, -4, 4) a <- rnorm(n_train, w, 0.5) # HAL-based density estimator of A|W haldensify_fit <- haldensify( A = a, W = w, n_bins = 10L, lambda_seq = exp(seq(-1, -10, length = 100)), # the following arguments are passed to hal9001::fit_hal() max_degree = 2, smoothness_orders = 0L, reduce_basis = 1 / sqrt(length(a)) ) # predictions to recover conditional density of A|W new_a <- seq(-4, 4, by = 0.1) new_w <- rep(0, length(new_a)) pred_dens <- predict(haldensify_fit, new_A = new_a, new_W = new_w)
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5) n_train <- 50 w <- runif(n_train, -4, 4) a <- rnorm(n_train, w, 0.5) # HAL-based density estimator of A|W haldensify_fit <- haldensify( A = a, W = w, n_bins = 10L, lambda_seq = exp(seq(-1, -10, length = 100)), # the following arguments are passed to hal9001::fit_hal() max_degree = 2, smoothness_orders = 0L, reduce_basis = 1 / sqrt(length(a)) ) # predictions to recover conditional density of A|W new_a <- seq(-4, 4, by = 0.1) new_w <- rep(0, length(new_a)) pred_dens <- predict(haldensify_fit, new_A = new_a, new_W = new_w)
Print: Highly Adaptive Lasso Conditional Density Estimates
## S3 method for class 'haldensify' print(x, ...)
## S3 method for class 'haldensify' print(x, ...)
x |
An object of class |
... |
Other options (not currently used). |
The print
method for objects of class haldensify
None. Called for the side effect of printing an informative summary
of slots of objects of class haldensify
.
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5) set.seed(11249) n_train <- 50 w <- runif(n_train, -4, 4) a <- rnorm(n_train, w, 0.5) # learn relationship A|W using HAL-based density estimation procedure haldensify_fit <- haldensify( A = a, W = w, n_bins = c(3, 5), lambda_seq = exp(seq(-1, -15, length = 50L)), max_degree = 2, smoothness_orders = 0, reduce_basis = 0.1 ) print(haldensify_fit)
# simulate data: W ~ U[-4, 4] and A|W ~ N(mu = W, sd = 0.5) set.seed(11249) n_train <- 50 w <- runif(n_train, -4, 4) a <- rnorm(n_train, w, 0.5) # learn relationship A|W using HAL-based density estimation procedure haldensify_fit <- haldensify( A = a, W = w, n_bins = c(3, 5), lambda_seq = exp(seq(-1, -15, length = 50L)), max_degree = 2, smoothness_orders = 0, reduce_basis = 0.1 ) print(haldensify_fit)
Print: IPW Estimates of the Causal Effects of Stochatic Shift Interventions
## S3 method for class 'ipw_haldensify' print(x, ..., ci_level = 0.95)
## S3 method for class 'ipw_haldensify' print(x, ..., ci_level = 0.95)
x |
An object of class |
... |
Other options (not currently used). |
ci_level |
A |
The print
method for objects of class ipw_haldensify
None. Called for the side effect of printing an informative summary
of slots of objects of class ipw_haldensify
.
# simulate data set.seed(11249) n_obs <- 50 W1 <- rbinom(n_obs, 1, 0.6) W2 <- rbinom(n_obs, 1, 0.2) A <- rnorm(n_obs, (2 * W1 - W2 - W1 * W2), 2) Y <- rbinom(n_obs, 1, plogis(3 * A + W1 + W2 - W1 * W2)) # fit the IPW estimator est_ipw_shift <- ipw_shift( W = cbind(W1, W2), A = A, Y = Y, delta = 0.5, n_bins = 3L, cv_folds = 3L, lambda_seq = exp(seq(-1, -10, length = 100L)), # arguments passed to hal9001::fit_hal() max_degree = 1, # ...continue arguments for IPW selector_type = "gcv" ) print(est_ipw_shift)
# simulate data set.seed(11249) n_obs <- 50 W1 <- rbinom(n_obs, 1, 0.6) W2 <- rbinom(n_obs, 1, 0.2) A <- rnorm(n_obs, (2 * W1 - W2 - W1 * W2), 2) Y <- rbinom(n_obs, 1, plogis(3 * A + W1 + W2 - W1 * W2)) # fit the IPW estimator est_ipw_shift <- ipw_shift( W = cbind(W1, W2), A = A, Y = Y, delta = 0.5, n_bins = 3L, cv_folds = 3L, lambda_seq = exp(seq(-1, -10, length = 100L)), # arguments passed to hal9001::fit_hal() max_degree = 1, # ...continue arguments for IPW selector_type = "gcv" ) print(est_ipw_shift)