Title: | Targeted Learning with Moderated Statistics for Biomarker Discovery |
---|---|
Description: | Tools for differential expression biomarker discovery based on microarray and next-generation sequencing data that leverage efficient semiparametric estimators of the average treatment effect for variable importance analysis. Estimation and inference of the (marginal) average treatment effects of potential biomarkers are computed by targeted minimum loss-based estimation, with joint, stable inference constructed across all biomarkers using a generalization of moderated statistics for use with the estimated efficient influence function. The procedure accommodates the use of ensemble machine learning for the estimation of nuisance functions. |
Authors: | Nima Hejazi [aut, cre, cph] , Alan Hubbard [aut, ths] , Mark van der Laan [aut, ths] , Weixin Cai [ctb] , Philippe Boileau [ctb] |
Maintainer: | Nima Hejazi <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.17.1 |
Built: | 2024-11-18 04:41:15 UTC |
Source: | https://github.com/nhejazi/biotmle |
Computes the causal target parameter defined as the difference between the biomarker expression values under treatment and those same values under no treatment, using Targeted Minimum Loss Estimation.
biomarkertmle( se, varInt, normalized = TRUE, ngscounts = FALSE, bppar_type = BiocParallel::MulticoreParam(), bppar_debug = FALSE, cv_folds = 1, g_lib = c("SL.mean", "SL.glm", "SL.bayesglm"), Q_lib = c("SL.mean", "SL.bayesglm", "SL.earth", "SL.ranger"), ... )
biomarkertmle( se, varInt, normalized = TRUE, ngscounts = FALSE, bppar_type = BiocParallel::MulticoreParam(), bppar_debug = FALSE, cv_folds = 1, g_lib = c("SL.mean", "SL.glm", "SL.bayesglm"), Q_lib = c("SL.mean", "SL.bayesglm", "SL.earth", "SL.ranger"), ... )
se |
A |
varInt |
A |
normalized |
A |
ngscounts |
A |
bppar_type |
A parallelization option specified by |
bppar_debug |
A |
cv_folds |
A |
g_lib |
A |
Q_lib |
A |
... |
Additional arguments to be passed to |
S4 object of class biotmle
, inheriting from
SummarizedExperiment
, with additional slots tmleOut
and
call
, among others, containing TML estimates of the ATE of exposure
on biomarker expression.
library(dplyr) library(biotmleData) library(SuperLearner) library(SummarizedExperiment) data(illuminaData) colData(illuminaData) <- colData(illuminaData) %>% data.frame() %>% mutate(age = as.numeric(age > median(age))) %>% DataFrame() benz_idx <- which(names(colData(illuminaData)) %in% "benzene") biomarkerTMLEout <- biomarkertmle( se = illuminaData[1:2, ], varInt = benz_idx, bppar_type = BiocParallel::SerialParam(), g_lib = c("SL.mean", "SL.glm"), Q_lib = c("SL.mean", "SL.glm") )
library(dplyr) library(biotmleData) library(SuperLearner) library(SummarizedExperiment) data(illuminaData) colData(illuminaData) <- colData(illuminaData) %>% data.frame() %>% mutate(age = as.numeric(age > median(age))) %>% DataFrame() benz_idx <- which(names(colData(illuminaData)) %in% "benzene") biomarkerTMLEout <- biomarkertmle( se = illuminaData[1:2, ], varInt = benz_idx, bppar_type = BiocParallel::SerialParam(), g_lib = c("SL.mean", "SL.glm"), Q_lib = c("SL.mean", "SL.glm") )
Constructor for class bioTMLE
class biotmle
object, sub-classed from SummarizedExperiment.
library(SummarizedExperiment) library(biotmleData) data(illuminaData) example_biotmle_class <- function(se) { call <- match.call(expand.dots = TRUE) biotmle <- .biotmle( SummarizedExperiment( assays = assay(se), rowData = rowData(se), colData = colData(se) ), call = call, ateOut = as.numeric(rep(NA, 10)), tmleOut = as.data.frame(matrix(NA, 10, 10)), topTable = as.data.frame(matrix(NA, 10, 10)) ) return(biotmle) } example_class <- example_biotmle_class(se = illuminaData)
library(SummarizedExperiment) library(biotmleData) data(illuminaData) example_biotmle_class <- function(se) { call <- match.call(expand.dots = TRUE) biotmle <- .biotmle( SummarizedExperiment( assays = assay(se), rowData = rowData(se), colData = colData(se) ), call = call, ateOut = as.numeric(rep(NA, 10)), tmleOut = as.data.frame(matrix(NA, 10, 10)), topTable = as.data.frame(matrix(NA, 10, 10)) ) return(biotmle) } example_class <- example_biotmle_class(se = illuminaData)
Virtual class union containing members of data.frame
and
limma::Elist
, used internally to handle situations when a returned
object has a type that cannot be guessed from the function call.
fusion of classes data.frame
and EList
, used within
.biotmle
by class bioTMLE
to handle uncertainty in the object
passed to slot "tmleOut".
Accessor for Table of Raw Efficient Influence Function Values
eif(object)
eif(object)
object |
S4 object of class |
contents of tmleOut
slot of object of class biotmle
.
This function performs influence curve-based estimation of the effect of an exposure on biological expression values associated with a given biomarker, controlling for a user-specified set of baseline covariates.
exp_biomarkertmle(Y, A, W, g_lib, Q_lib, cv_folds, ...)
exp_biomarkertmle(Y, A, W, g_lib, Q_lib, cv_folds, ...)
Y |
A |
A |
A |
W |
A |
g_lib |
A |
Q_lib |
A |
cv_folds |
A |
... |
Additional arguments passed to |
TMLE-based estimate of the relationship between biomarker expression
and changes in an exposure variable, computed iteratively and saved in the
tmleOut
slot in a biotmle
object.
Heatmap of contributions of a select subset of biomarkers to the variable importance measure changes as assessed by influence curve-based estimation, across all subjects. The heatmap produced performs supervised clustering, as per Pollard & van der Laan (2008) <doi:10.2202/1544-6115.1404>.
heatmap_ic(x, ..., design, FDRcutoff = 0.25, type = c("top", "all"), top = 25)
heatmap_ic(x, ..., design, FDRcutoff = 0.25, type = c("top", "all"), top = 25)
x |
Object of class |
... |
additional arguments passed to |
design |
A vector giving the contrast to be displayed in the heatmap. |
FDRcutoff |
Cutoff to be used in controlling the False Discovery Rate. |
type |
A |
top |
Number of identified biomarkers to plot in the heatmap. |
heatmap (from superheat) using hierarchical clustering to plot the changes in the variable importance measure for all subjects across a specified top number of biomarkers.
## Not run: library(dplyr) library(biotmleData) library(SummarizedExperiment) data(illuminaData) colData(illuminaData) <- colData(illuminaData) %>% data.frame() %>% mutate(age = as.numeric(age > median(age))) %>% DataFrame() benz_idx <- which(names(colData(illuminaData)) %in% "benzene") biomarkerTMLEout <- biomarkertmle( se = illuminaData, varInt = benz_idx, bppar_type = BiocParallel::SerialParam(), g_lib = c("SL.mean", "SL.glm"), Q_lib = c("SL.mean", "SL.glm") ) limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout) heatmap_ic(x = limmaTMLEout, design = design, FDRcutoff = 0.05, top = 10) ## End(Not run)
## Not run: library(dplyr) library(biotmleData) library(SummarizedExperiment) data(illuminaData) colData(illuminaData) <- colData(illuminaData) %>% data.frame() %>% mutate(age = as.numeric(age > median(age))) %>% DataFrame() benz_idx <- which(names(colData(illuminaData)) %in% "benzene") biomarkerTMLEout <- biomarkertmle( se = illuminaData, varInt = benz_idx, bppar_type = BiocParallel::SerialParam(), g_lib = c("SL.mean", "SL.glm"), Q_lib = c("SL.mean", "SL.glm") ) limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout) heatmap_ic(x = limmaTMLEout, design = design, FDRcutoff = 0.05, top = 10) ## End(Not run)
Performs variance shrinkage via application of an empirical Bayes procedure (of LIMMA) on the observed data after a transformation moving the data to influence function space, based on the average treatment effect parameter.
modtest_ic(biotmle, adjust = "BH", pval_type = c("normal", "logistic"), ...)
modtest_ic(biotmle, adjust = "BH", pval_type = c("normal", "logistic"), ...)
biotmle |
|
adjust |
the multiple testing correction to be applied to p-values that are generated from the moderated tests. The recommended (default) method is that of Benjamini and Hochberg. See topTable for a list of appropriate methods. |
pval_type |
The reference distribution to be used for computing the p-value. Those based on the normal approximation tend to provide misleading inference when working with moderately sized (finite) samples. Use of the logistic distribution has been found to empirically improve performance in settings where multiple hypothesis testing is a concern. |
... |
Other arguments passed to |
biotmle
object containing the results of applying both
lmFit
and topTable
.
library(dplyr) library(biotmleData) library(SuperLearner) library(SummarizedExperiment) data(illuminaData) colData(illuminaData) <- colData(illuminaData) %>% data.frame() %>% dplyr::mutate(age = as.numeric(age > median(age))) %>% DataFrame() benz_idx <- which(names(colData(illuminaData)) %in% "benzene") biomarkerTMLEout <- biomarkertmle( se = illuminaData[1:2, ], varInt = benz_idx, bppar_type = BiocParallel::SerialParam(), g_lib = c("SL.mean", "SL.glm"), Q_lib = c("SL.mean", "SL.glm") ) limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout)
library(dplyr) library(biotmleData) library(SuperLearner) library(SummarizedExperiment) data(illuminaData) colData(illuminaData) <- colData(illuminaData) %>% data.frame() %>% dplyr::mutate(age = as.numeric(age > median(age))) %>% DataFrame() benz_idx <- which(names(colData(illuminaData)) %in% "benzene") biomarkerTMLEout <- biomarkertmle( se = illuminaData[1:2, ], varInt = benz_idx, bppar_type = BiocParallel::SerialParam(), g_lib = c("SL.mean", "SL.glm"), Q_lib = c("SL.mean", "SL.glm") ) limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout)
Histogram of raw or FDR-adjusted p-values from the moderated t-test.
## S3 method for class 'bioTMLE' plot(x, ..., type = "pvals_adj")
## S3 method for class 'bioTMLE' plot(x, ..., type = "pvals_adj")
x |
object of class |
... |
additional arguments passed |
type |
character describing whether to provide a plot of unadjusted or adjusted p-values (adjustment performed via Benjamini-Hochberg) |
object of class ggplot
containing a histogram of the raw or
Benjamini-Hochberg corrected p-values (depending on user input).
## Not run: library(dplyr) library(biotmleData) library(SuperLearner) library(SummarizedExperiment) data(illuminaData) colData(illuminaData) <- colData(illuminaData) %>% data.frame() %>% mutate(age = as.numeric(age > median(age))) %>% DataFrame() benz_idx <- which(names(colData(illuminaData)) %in% "benzene") biomarkerTMLEout <- biomarkertmle( se = illuminaData, varInt = benz_idx, bppar_type = BiocParallel::SerialParam(), g_lib = c("SL.mean", "SL.glm"), Q_lib = c("SL.mean", "SL.glm") ) limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout) plot(x = limmaTMLEout, type = "pvals_adj") ## End(Not run)
## Not run: library(dplyr) library(biotmleData) library(SuperLearner) library(SummarizedExperiment) data(illuminaData) colData(illuminaData) <- colData(illuminaData) %>% data.frame() %>% mutate(age = as.numeric(age > median(age))) %>% DataFrame() benz_idx <- which(names(colData(illuminaData)) %in% "benzene") biomarkerTMLEout <- biomarkertmle( se = illuminaData, varInt = benz_idx, bppar_type = BiocParallel::SerialParam(), g_lib = c("SL.mean", "SL.glm"), Q_lib = c("SL.mean", "SL.glm") ) limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout) plot(x = limmaTMLEout, type = "pvals_adj") ## End(Not run)
This function prepares next-generation sequencing data (counts) for use with
the biomarker TMLE procedure by invoking the voom transform of limma
.
rnaseq_ic(biotmle, weights = TRUE, ...)
rnaseq_ic(biotmle, weights = TRUE, ...)
biotmle |
A |
weights |
A |
... |
Other arguments to be passed to |
EList
object containing voom-transformed expression measures
of count data (actually, the mean-variance trend) in the "E" slot, to be
passed into the biomarker TMLE procedure.
Accessor for Results of Moderated Influence Function Hypothesis Testing
toptable(object)
toptable(object)
object |
S4 object of class |
contents of topTable
slot of object of class biotmle
.
Volcano plot of the log-changes in the target causal paramter against the log raw p-values from the moderated t-test.
volcano_ic(biotmle, ate_bound = 1, pval_bound = 0.2)
volcano_ic(biotmle, ate_bound = 1, pval_bound = 0.2)
biotmle |
object of class |
ate_bound |
A |
pval_bound |
A |
object of class ggplot
containing a standard volcano plot of
the log-fold change in the causal target parameter against the raw log
p-value computed from the moderated tests in modtest_ic
.
## Not run: library(dplyr) library(biotmleData) library(SuperLearner) library(SummarizedExperiment) data(illuminaData) colData(illuminaData) <- colData(illuminaData) %>% data.frame() %>% mutate(age = as.numeric(age > median(age))) %>% DataFrame() benz_idx <- which(names(colData(illuminaData)) %in% "benzene") biomarkerTMLEout <- biomarkertmle( se = illuminaData, varInt = benz_idx, bppar_type = BiocParallel::SerialParam(), g_lib = c("SL.mean", "SL.glm"), Q_lib = c("SL.mean", "SL.glm") ) limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout) volcano_ic(biotmle = limmaTMLEout) ## End(Not run)
## Not run: library(dplyr) library(biotmleData) library(SuperLearner) library(SummarizedExperiment) data(illuminaData) colData(illuminaData) <- colData(illuminaData) %>% data.frame() %>% mutate(age = as.numeric(age > median(age))) %>% DataFrame() benz_idx <- which(names(colData(illuminaData)) %in% "benzene") biomarkerTMLEout <- biomarkertmle( se = illuminaData, varInt = benz_idx, bppar_type = BiocParallel::SerialParam(), g_lib = c("SL.mean", "SL.glm"), Q_lib = c("SL.mean", "SL.glm") ) limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout) volcano_ic(biotmle = limmaTMLEout) ## End(Not run)