Package 'biotmle' reference manual

Title:	Targeted Learning with Moderated Statistics for Biomarker Discovery
Description:	Tools for differential expression biomarker discovery based on microarray and next-generation sequencing data that leverage efficient semiparametric estimators of the average treatment effect for variable importance analysis. Estimation and inference of the (marginal) average treatment effects of potential biomarkers are computed by targeted minimum loss-based estimation, with joint, stable inference constructed across all biomarkers using a generalization of moderated statistics for use with the estimated efficient influence function. The procedure accommodates the use of ensemble machine learning for the estimation of nuisance functions.
Authors:	Nima Hejazi [aut, cre, cph] , Alan Hubbard [aut, ths] , Mark van der Laan [aut, ths] , Weixin Cai [ctb] , Philippe Boileau [ctb]
Maintainer:	Nima Hejazi <[email protected]>
License:	MIT + file LICENSE
Version:	1.17.1
Built:	2025-01-17 05:10:12 UTC
Source:	https://github.com/nhejazi/biotmle

Biomarker Evaluation with Targeted Minimum Loss Estimation of the ATE

Description

Computes the causal target parameter defined as the difference between the biomarker expression values under treatment and those same values under no treatment, using Targeted Minimum Loss Estimation.

Usage

biomarkertmle(
  se,
  varInt,
  normalized = TRUE,
  ngscounts = FALSE,
  bppar_type = BiocParallel::MulticoreParam(),
  bppar_debug = FALSE,
  cv_folds = 1,
  g_lib = c("SL.mean", "SL.glm", "SL.bayesglm"),
  Q_lib = c("SL.mean", "SL.bayesglm", "SL.earth", "SL.ranger"),
  ...
)
biomarkertmle(
  se,
  varInt,
  normalized = TRUE,
  ngscounts = FALSE,
  bppar_type = BiocParallel::MulticoreParam(),
  bppar_debug = FALSE,
  cv_folds = 1,
  g_lib = c("SL.mean", "SL.glm", "SL.bayesglm"),
  Q_lib = c("SL.mean", "SL.bayesglm", "SL.earth", "SL.ranger"),
  ...
)

Arguments

`se`	A `SummarizedExperiment` containing microarray expression or next-generation sequencing data in the `assays` slot and a matrix of phenotype-level data in the `colData` slot.
`varInt`	A `numeric` indicating the column of the design matrix corresponding to the treatment or outcome of interest (in the `colData` slot of the `SummarizedExperiment` argument "se").
`normalized`	A `logical` indicating whether the data included in the `assay` slot of the input `SummarizedExperiment` object has been normalized externally. The default is set to `TRUE` with the expectation that an appropriate normalization method has been applied. If set to `FALSE`, median normalization is performed for microarray data.
`ngscounts`	A `logical` indicating whether the data are counts generated from a next-generation sequencing experiment (e.g., RNA-seq). The default setting assumes continuous expression measures as generated by microarray platforms.
`bppar_type`	A parallelization option specified by `BiocParallel`. Consult the manual page for `BiocParallelParam` for possible types and their descriptions. The default for this argument is `MulticoreParam`, for multicore evaluation.
`bppar_debug`	A `logical` indicating whether or not to rely upon pkgBiocParallel. Setting this argument to `TRUE`, replaces the call to `bplapply` by a call to `lapply`, which significantly reduces the overhead of debugging. Note that invoking this option overrides all other parallelization arguments.
`cv_folds`	A `numeric` scalar indicating how many folds to use in performing targeted minimum loss estimation. Cross-validated estimates have been demonstrated to allow relaxation of certain theoretical conditions and and accommodate the construction of more conservative variance estimates.
`g_lib`	A `character` vector specifying the library of machine learning algorithms for use in fitting the propensity score P(A = a \| W).
`Q_lib`	A `character` vector specifying the library of machine learning algorithms for use in fitting the outcome regression E[Y \| A,W].
`...`	Additional arguments to be passed to `drtmle` in computing the targeted minimum loss estimator of the average treatment effect.

Value

S4 object of class biotmle, inheriting from SummarizedExperiment, with additional slots tmleOut and call, among others, containing TML estimates of the ATE of exposure on biomarker expression.

Examples

library(dplyr)
library(biotmleData)
library(SuperLearner)
library(SummarizedExperiment)
data(illuminaData)

colData(illuminaData) <- colData(illuminaData) %>%
  data.frame() %>%
  mutate(age = as.numeric(age > median(age))) %>%
  DataFrame()
benz_idx <- which(names(colData(illuminaData)) %in% "benzene")

biomarkerTMLEout <- biomarkertmle(
  se = illuminaData[1:2, ],
  varInt = benz_idx,
  bppar_type = BiocParallel::SerialParam(),
  g_lib = c("SL.mean", "SL.glm"),
  Q_lib = c("SL.mean", "SL.glm")
)
library(dplyr)
library(biotmleData)
library(SuperLearner)
library(SummarizedExperiment)
data(illuminaData)

colData(illuminaData) <- colData(illuminaData) %>%
  data.frame() %>%
  mutate(age = as.numeric(age > median(age))) %>%
  DataFrame()
benz_idx <- which(names(colData(illuminaData)) %in% "benzene")

biomarkerTMLEout <- biomarkertmle(
  se = illuminaData[1:2, ],
  varInt = benz_idx,
  bppar_type = BiocParallel::SerialParam(),
  g_lib = c("SL.mean", "SL.glm"),
  Q_lib = c("SL.mean", "SL.glm")
)

Constructor for class bioTMLE

Description

Constructor for class bioTMLE

Value

class biotmle object, sub-classed from SummarizedExperiment.

Examples

library(SummarizedExperiment)
library(biotmleData)
data(illuminaData)

example_biotmle_class <- function(se) {
  call <- match.call(expand.dots = TRUE)
  biotmle <- .biotmle(
    SummarizedExperiment(
      assays = assay(se),
      rowData = rowData(se),
      colData = colData(se)
    ),
    call = call,
    ateOut = as.numeric(rep(NA, 10)),
    tmleOut = as.data.frame(matrix(NA, 10, 10)),
    topTable = as.data.frame(matrix(NA, 10, 10))
  )
  return(biotmle)
}

example_class <- example_biotmle_class(se = illuminaData)
library(SummarizedExperiment)
library(biotmleData)
data(illuminaData)

example_biotmle_class <- function(se) {
  call <- match.call(expand.dots = TRUE)
  biotmle <- .biotmle(
    SummarizedExperiment(
      assays = assay(se),
      rowData = rowData(se),
      colData = colData(se)
    ),
    call = call,
    ateOut = as.numeric(rep(NA, 10)),
    tmleOut = as.data.frame(matrix(NA, 10, 10)),
    topTable = as.data.frame(matrix(NA, 10, 10))
  )
  return(biotmle)
}

example_class <- example_biotmle_class(se = illuminaData)

S4 class union data.frame_OR_EList

Description

Virtual class union containing members of data.frame and limma::Elist, used internally to handle situations when a returned object has a type that cannot be guessed from the function call.

Value

fusion of classes data.frame and EList, used within .biotmle by class bioTMLE to handle uncertainty in the object passed to slot "tmleOut".

Accessor for Table of Raw Efficient Influence Function Values

Description

Accessor for Table of Raw Efficient Influence Function Values

Usage

eif(object)
eif(object)

Arguments

object

S4 object of class bioTMLE.

Value

contents of tmleOut slot of object of class biotmle.

TMLE procedure using ATE for Biomarker Identication from Exposure

Description

This function performs influence curve-based estimation of the effect of an exposure on biological expression values associated with a given biomarker, controlling for a user-specified set of baseline covariates.

Usage

exp_biomarkertmle(Y, A, W, g_lib, Q_lib, cv_folds, ...)
exp_biomarkertmle(Y, A, W, g_lib, Q_lib, cv_folds, ...)

Arguments

`Y`	A `numeric` vector of expression values for a given biomarker.
`A`	A `numeric` vector of discretized exposure vector (e.g., from a design matrix whose effect on expression values is of interest.
`W`	A `Matrix` of `numeric` values corresponding to baseline covariates to be marginalized over in the estimation process.
`g_lib`	A `character` vector identifying the library of learning algorithms to be used in fitting the propensity score P[A = a \| W].
`Q_lib`	A `character` vector identifying the library of learning algorithms to be used in fitting the outcome regression E[Y \| A, W].
`cv_folds`	A `numeric` scalar indicating how many folds to use in performing targeted minimum loss estimation. Cross-validated estimates are more robust, allowing relaxing of theoretical conditions and construction of conservative variance estimates.
`...`	Additional arguments passed to `drtmle` in computing the targeted minimum loss estimator of the average treatment effect.

Value

TMLE-based estimate of the relationship between biomarker expression and changes in an exposure variable, computed iteratively and saved in the tmleOut slot in a biotmle object.

Heatmap for class biotmle

Description

Heatmap of contributions of a select subset of biomarkers to the variable importance measure changes as assessed by influence curve-based estimation, across all subjects. The heatmap produced performs supervised clustering, as per Pollard & van der Laan (2008) <doi:10.2202/1544-6115.1404>.

Usage

heatmap_ic(x, ..., design, FDRcutoff = 0.25, type = c("top", "all"), top = 25)
heatmap_ic(x, ..., design, FDRcutoff = 0.25, type = c("top", "all"), top = 25)

Arguments

`x`	Object of class `biotmle` as produced by an appropriate call to `biomarkertmle`.
`...`	additional arguments passed to `superheat::superheat` as necessary
`design`	A vector giving the contrast to be displayed in the heatmap.
`FDRcutoff`	Cutoff to be used in controlling the False Discovery Rate.
`type`	A `character` describing whether to plot only a top number (as defined by FDR-corrected p-value) of biomarkers or all biomarkers.
`top`	Number of identified biomarkers to plot in the heatmap.

Value

heatmap (from superheat) using hierarchical clustering to plot the changes in the variable importance measure for all subjects across a specified top number of biomarkers.

Examples

## Not run: 
library(dplyr)
library(biotmleData)
library(SummarizedExperiment)
data(illuminaData)

colData(illuminaData) <- colData(illuminaData) %>%
  data.frame() %>%
  mutate(age = as.numeric(age > median(age))) %>%
  DataFrame()
benz_idx <- which(names(colData(illuminaData)) %in% "benzene")

biomarkerTMLEout <- biomarkertmle(
  se = illuminaData,
  varInt = benz_idx,
  bppar_type = BiocParallel::SerialParam(),
  g_lib = c("SL.mean", "SL.glm"),
  Q_lib = c("SL.mean", "SL.glm")
)

limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout)

heatmap_ic(x = limmaTMLEout, design = design, FDRcutoff = 0.05, top = 10)

## End(Not run)
## Not run: 
library(dplyr)
library(biotmleData)
library(SummarizedExperiment)
data(illuminaData)

colData(illuminaData) <- colData(illuminaData) %>%
  data.frame() %>%
  mutate(age = as.numeric(age > median(age))) %>%
  DataFrame()
benz_idx <- which(names(colData(illuminaData)) %in% "benzene")

biomarkerTMLEout <- biomarkertmle(
  se = illuminaData,
  varInt = benz_idx,
  bppar_type = BiocParallel::SerialParam(),
  g_lib = c("SL.mean", "SL.glm"),
  Q_lib = c("SL.mean", "SL.glm")
)

limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout)

heatmap_ic(x = limmaTMLEout, design = design, FDRcutoff = 0.05, top = 10)

## End(Not run)

Moderated Statistical Tests for Influence Functions

Description

Performs variance shrinkage via application of an empirical Bayes procedure (of LIMMA) on the observed data after a transformation moving the data to influence function space, based on the average treatment effect parameter.

Usage

modtest_ic(biotmle, adjust = "BH", pval_type = c("normal", "logistic"), ...)
modtest_ic(biotmle, adjust = "BH", pval_type = c("normal", "logistic"), ...)

Arguments

`biotmle`	`biotmle` object as generated by `biomarkertmle`
`adjust`	the multiple testing correction to be applied to p-values that are generated from the moderated tests. The recommended (default) method is that of Benjamini and Hochberg. See topTable for a list of appropriate methods.
`pval_type`	The reference distribution to be used for computing the p-value. Those based on the normal approximation tend to provide misleading inference when working with moderately sized (finite) samples. Use of the logistic distribution has been found to empirically improve performance in settings where multiple hypothesis testing is a concern.
`...`	Other arguments passed to `topTable`.

Value

biotmle object containing the results of applying both lmFit and topTable.

Examples

library(dplyr)
library(biotmleData)
library(SuperLearner)
library(SummarizedExperiment)
data(illuminaData)

colData(illuminaData) <- colData(illuminaData) %>%
  data.frame() %>%
  dplyr::mutate(age = as.numeric(age > median(age))) %>%
  DataFrame()
benz_idx <- which(names(colData(illuminaData)) %in% "benzene")

biomarkerTMLEout <- biomarkertmle(
  se = illuminaData[1:2, ],
  varInt = benz_idx,
  bppar_type = BiocParallel::SerialParam(),
  g_lib = c("SL.mean", "SL.glm"),
  Q_lib = c("SL.mean", "SL.glm")
)

limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout)
library(dplyr)
library(biotmleData)
library(SuperLearner)
library(SummarizedExperiment)
data(illuminaData)

colData(illuminaData) <- colData(illuminaData) %>%
  data.frame() %>%
  dplyr::mutate(age = as.numeric(age > median(age))) %>%
  DataFrame()
benz_idx <- which(names(colData(illuminaData)) %in% "benzene")

biomarkerTMLEout <- biomarkertmle(
  se = illuminaData[1:2, ],
  varInt = benz_idx,
  bppar_type = BiocParallel::SerialParam(),
  g_lib = c("SL.mean", "SL.glm"),
  Q_lib = c("SL.mean", "SL.glm")
)

limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout)

Plot p-values from moderated statistical tests for class biotmle

Description

Histogram of raw or FDR-adjusted p-values from the moderated t-test.

Usage

## S3 method for class 'bioTMLE'
plot(x, ..., type = "pvals_adj")
## S3 method for class 'bioTMLE'
plot(x, ..., type = "pvals_adj")

Arguments

`x`	object of class `biotmle` as produced by an appropriate call to `biomarkertmle`
`...`	additional arguments passed `plot` as necessary
`type`	character describing whether to provide a plot of unadjusted or adjusted p-values (adjustment performed via Benjamini-Hochberg)

Value

object of class ggplot containing a histogram of the raw or Benjamini-Hochberg corrected p-values (depending on user input).

Examples

## Not run: 
library(dplyr)
library(biotmleData)
library(SuperLearner)
library(SummarizedExperiment)
data(illuminaData)

colData(illuminaData) <- colData(illuminaData) %>%
  data.frame() %>%
  mutate(age = as.numeric(age > median(age))) %>%
  DataFrame()
benz_idx <- which(names(colData(illuminaData)) %in% "benzene")

biomarkerTMLEout <- biomarkertmle(
  se = illuminaData,
  varInt = benz_idx,
  bppar_type = BiocParallel::SerialParam(),
  g_lib = c("SL.mean", "SL.glm"),
  Q_lib = c("SL.mean", "SL.glm")
)

limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout)

plot(x = limmaTMLEout, type = "pvals_adj")

## End(Not run)
## Not run: 
library(dplyr)
library(biotmleData)
library(SuperLearner)
library(SummarizedExperiment)
data(illuminaData)

colData(illuminaData) <- colData(illuminaData) %>%
  data.frame() %>%
  mutate(age = as.numeric(age > median(age))) %>%
  DataFrame()
benz_idx <- which(names(colData(illuminaData)) %in% "benzene")

biomarkerTMLEout <- biomarkertmle(
  se = illuminaData,
  varInt = benz_idx,
  bppar_type = BiocParallel::SerialParam(),
  g_lib = c("SL.mean", "SL.glm"),
  Q_lib = c("SL.mean", "SL.glm")
)

limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout)

plot(x = limmaTMLEout, type = "pvals_adj")

## End(Not run)

Utility for using voom transformation with TMLE for biomarker discovery

Description

This function prepares next-generation sequencing data (counts) for use with the biomarker TMLE procedure by invoking the voom transform of limma.

Usage

rnaseq_ic(biotmle, weights = TRUE, ...)
rnaseq_ic(biotmle, weights = TRUE, ...)

Arguments

`biotmle`	A `bioTMLE` object containing next-generation sequencing count data in its "assays" slot.
`weights`	A `logical` indicating whether to return quality weights of samples in the output object.
`...`	Other arguments to be passed to `voom` or `voomWithQualityWeights` as appropriate.

Value

EList object containing voom-transformed expression measures of count data (actually, the mean-variance trend) in the "E" slot, to be passed into the biomarker TMLE procedure.

Accessor for Results of Moderated Influence Function Hypothesis Testing

Description

Accessor for Results of Moderated Influence Function Hypothesis Testing

Usage

toptable(object)
toptable(object)

Arguments

object

S4 object of class bioTMLE.

Value

contents of topTable slot of object of class biotmle.

Volcano plot for class biotmle

Description

Volcano plot of the log-changes in the target causal paramter against the log raw p-values from the moderated t-test.

Usage

volcano_ic(biotmle, ate_bound = 1, pval_bound = 0.2)
volcano_ic(biotmle, ate_bound = 1, pval_bound = 0.2)

Arguments

`biotmle`	object of class `biotmle` as produced by an appropriate call to `biomarkertmle`
`ate_bound`	A `numeric` indicating the highest magnitude of the average treatment effect to be colored on the x-axis of the volcano plot; this limits the observations to be considered differentially expressed to those in a user-specified interval.
`pval_bound`	A `numeric` indicating the largest corrected p-value to be colored on the y-axis of the volcano plot; this limits observations considered as differentially expressed to those in a user-specified interval.

Value

object of class ggplot containing a standard volcano plot of the log-fold change in the causal target parameter against the raw log p-value computed from the moderated tests in modtest_ic.

Examples

## Not run: 
library(dplyr)
library(biotmleData)
library(SuperLearner)
library(SummarizedExperiment)
data(illuminaData)

colData(illuminaData) <- colData(illuminaData) %>%
  data.frame() %>%
  mutate(age = as.numeric(age > median(age))) %>%
  DataFrame()
benz_idx <- which(names(colData(illuminaData)) %in% "benzene")

biomarkerTMLEout <- biomarkertmle(
  se = illuminaData,
  varInt = benz_idx,
  bppar_type = BiocParallel::SerialParam(),
  g_lib = c("SL.mean", "SL.glm"),
  Q_lib = c("SL.mean", "SL.glm")
)

limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout)

volcano_ic(biotmle = limmaTMLEout)

## End(Not run)
## Not run: 
library(dplyr)
library(biotmleData)
library(SuperLearner)
library(SummarizedExperiment)
data(illuminaData)

colData(illuminaData) <- colData(illuminaData) %>%
  data.frame() %>%
  mutate(age = as.numeric(age > median(age))) %>%
  DataFrame()
benz_idx <- which(names(colData(illuminaData)) %in% "benzene")

biomarkerTMLEout <- biomarkertmle(
  se = illuminaData,
  varInt = benz_idx,
  bppar_type = BiocParallel::SerialParam(),
  g_lib = c("SL.mean", "SL.glm"),
  Q_lib = c("SL.mean", "SL.glm")
)

limmaTMLEout <- modtest_ic(biotmle = biomarkerTMLEout)

volcano_ic(biotmle = limmaTMLEout)

## End(Not run)

Package 'biotmle'

Help Index

Biomarker Evaluation with Targeted Minimum Loss Estimation of the ATE

Description

Usage

Arguments

Value

Examples

Constructor for class bioTMLE

Description

Value

Examples

S4 class union data.frame_OR_EList

Description

Value

Accessor for Table of Raw Efficient Influence Function Values

Description

Usage

Arguments

Value

TMLE procedure using ATE for Biomarker Identication from Exposure

Description

Usage

Arguments

Value

Heatmap for class biotmle

Description

Usage

Arguments

Value

Examples

Moderated Statistical Tests for Influence Functions

Description

Usage

Arguments

Value

Examples

Plot p-values from moderated statistical tests for class biotmle

Description

Usage

Arguments

Value

Examples

Utility for using voom transformation with TMLE for biomarker discovery

Description

Usage

Arguments

Value

Accessor for Results of Moderated Influence Function Hypothesis Testing

Description

Usage

Arguments

Value

Volcano plot for class biotmle

Description

Usage

Arguments

Value

Examples