Title: | Bayesian Kernel Machine Regression |
---|---|
Description: | Implementation of a statistical approach for estimating the joint health effects of multiple concurrent exposures, as described in Bobb et al (2015) <doi:10.1093/biostatistics/kxu058>. |
Authors: | Jennifer F. Bobb [aut, cre],
Luke Duttweiler [ctb] |
Maintainer: | Jennifer F. Bobb <[email protected]> |
License: | GPL-2 |
Version: | 0.2.2.9000 |
Built: | 2025-02-13 04:12:36 UTC |
Source: | https://github.com/jenfb/bkmr |
h
at a new predictor valuesCompute the posterior mean and variance of h
at a new predictor values
ComputePostmeanHnew( fit, y = NULL, Z = NULL, X = NULL, Znew = NULL, sel = NULL, method = "approx" )
ComputePostmeanHnew( fit, y = NULL, Z = NULL, X = NULL, Znew = NULL, sel = NULL, method = "approx" )
fit |
An object containing the results returned by a the |
y |
a vector of outcome data of length |
Z |
an |
X |
an |
Znew |
matrix of new predictor values at which to predict new |
sel |
selects which iterations of the MCMC sampler to use for inference; see details |
method |
method for obtaining posterior summaries at a vector of new points. Options are "approx" and "exact"; defaults to "approx", which is faster particularly for large datasets; see details |
If method == "approx"
, the argument sel
defaults to the second half of the MCMC iterations.
If method == "exact"
, the argument sel
defaults to keeping every 10 iterations after dropping the first 50% of samples, or if this results in fewer than 100 iterations, than 100 iterations are kept
For guided examples and additional information, go to https://jenfb.github.io/bkmr/overview.html
a list of length two containing the posterior mean vector and posterior variance matrix
set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) med_vals <- apply(Z, 2, median) Znew <- matrix(med_vals, nrow = 1) h_true <- dat$HFun(Znew) h_est1 <- ComputePostmeanHnew(fitkm, Znew = Znew, method = "approx") h_est2 <- ComputePostmeanHnew(fitkm, Znew = Znew, method = "exact")
set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) med_vals <- apply(Z, 2, median) Znew <- matrix(med_vals, nrow = 1) h_true <- dat$HFun(Znew) h_est1 <- ComputePostmeanHnew(fitkm, Znew = Znew, method = "approx") h_est2 <- ComputePostmeanHnew(fitkm, Znew = Znew, method = "exact")
Obtain summary statistics of each parameter from the BKMR fit
ExtractEsts(fit, q = c(0.025, 0.25, 0.5, 0.75, 0.975), sel = NULL)
ExtractEsts(fit, q = c(0.025, 0.25, 0.5, 0.75, 0.975), sel = NULL)
fit |
An object containing the results returned by a the |
q |
vector of quantiles |
sel |
logical expression indicating samples to keep; defaults to keeping the second half of all samples |
a list where each component is a data frame containing the summary statistics of the posterior distribution of one of the parameters (or vector of parameters) being estimated
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) ests <- ExtractEsts(fitkm) names(ests) ests$beta
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) ests <- ExtractEsts(fitkm) names(ests) ests$beta
Extract posterior inclusion probabilities (PIPs) from Bayesian Kernel Machine Regression (BKMR) model fit
ExtractPIPs(fit, sel = NULL, z.names = NULL)
ExtractPIPs(fit, sel = NULL, z.names = NULL)
fit |
An object containing the results returned by a the |
sel |
logical expression indicating samples to keep; defaults to keeping the second half of all samples |
z.names |
optional argument providing the names of the variables included in the |
For guided examples, go to https://jenfb.github.io/bkmr/overview.html
a data frame with the variable-specific PIPs for BKMR fit with component-wise variable selection, and with the group-specific and conditional (within-group) PIPs for BKMR fit with hierarchical variable selection.
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) ExtractPIPs(fitkm)
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) ExtractPIPs(fitkm)
Extract samples of each parameter from the BKMR fit
ExtractSamps(fit, sel = NULL)
ExtractSamps(fit, sel = NULL)
fit |
An object containing the results returned by a the |
sel |
logical expression indicating samples to keep; defaults to keeping the second half of all samples |
a list where each component contains the posterior samples of one of the parameters (or vector of parameters) being estimated
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) samps <- ExtractSamps(fitkm)
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) samps <- ExtractSamps(fitkm)
Investigate the impact of the r[m]
parameters on the smoothness of the exposure-response function h(z[m])
.
InvestigatePrior( y, Z, X, ngrid = 50, q.seq = c(2, 1, 1/2, 1/4, 1/8, 1/16), r.seq = NULL, Drange = NULL, verbose = FALSE )
InvestigatePrior( y, Z, X, ngrid = 50, q.seq = c(2, 1, 1/2, 1/4, 1/8, 1/16), r.seq = NULL, Drange = NULL, verbose = FALSE )
y |
a vector of outcome data of length |
Z |
an |
X |
an |
ngrid |
Number of grid points over which to plot the exposure-response function |
q.seq |
Sequence of values corresponding to different degrees of smoothness in the estimated exposure-response function. A value of q corresponds to fractions of the range of the data over which there is a decay in the correlation |
r.seq |
sequence of values at which to fix |
Drange |
the range of the |
verbose |
TRUE or FALSE: flag indicating whether to print to the screen which exposure variable and q value has been completed |
For guided examples, go to https://jenfb.github.io/bkmr/overview.html
a list containing the predicted values, residuals, and estimated predictor-response function for each degree of smoothness being considered
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X priorfits <- InvestigatePrior(y = y, Z = Z, X = X, q.seq = c(2, 1/2, 1/4, 1/16)) PlotPriorFits(y = y, Z = Z, X = X, fits = priorfits)
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X priorfits <- InvestigatePrior(y = y, Z = Z, X = X, q.seq = c(2, 1/2, 1/4, 1/16)) PlotPriorFits(y = y, Z = Z, X = X, fits = priorfits)
Fits the Bayesian kernel machine regression (BKMR) model using Markov chain Monte Carlo (MCMC) methods.
kmbayes( y, Z, X = NULL, iter = 1000, family = "gaussian", id = NULL, verbose = TRUE, Znew = NULL, starting.values = NULL, control.params = NULL, varsel = FALSE, groups = NULL, knots = NULL, ztest = NULL, rmethod = "varying", est.h = FALSE )
kmbayes( y, Z, X = NULL, iter = 1000, family = "gaussian", id = NULL, verbose = TRUE, Znew = NULL, starting.values = NULL, control.params = NULL, varsel = FALSE, groups = NULL, knots = NULL, ztest = NULL, rmethod = "varying", est.h = FALSE )
y |
a vector of outcome data of length |
Z |
an |
X |
an |
iter |
number of iterations to run the sampler |
family |
a description of the error distribution and link function to be used in the model. Currently implemented for |
id |
optional vector (of length |
verbose |
TRUE or FALSE: flag indicating whether to print intermediate diagnostic information during the model fitting. |
Znew |
optional matrix of new predictor values at which to predict |
starting.values |
list of starting values for each parameter. If not specified default values will be chosen. |
control.params |
list of parameters specifying the prior distributions and tuning parameters for the MCMC algorithm. If not specified default values will be chosen. |
varsel |
TRUE or FALSE: indicator for whether to conduct variable selection on the Z variables in |
groups |
optional vector (of length |
knots |
optional matrix of knot locations for implementing the Gaussian predictive process of Banerjee et al. (2008). Currently only implemented for models without a random intercept. |
ztest |
optional vector indicating on which variables in Z to conduct variable selection (the remaining variables will be forced into the model). |
rmethod |
for those predictors being forced into the |
est.h |
TRUE or FALSE: indicator for whether to sample from the posterior distribution of the subject-specific effects h_i within the main sampler. This will slow down the model fitting. |
an object of class "bkmrfit" (containing the posterior samples from the model fit), which has the associated methods:
print
(i.e., print.bkmrfit
)
summary
(i.e., summary.bkmrfit
)
Bobb, JF, Valeri L, Claus Henn B, Christiani DC, Wright RO, Mazumdar M, Godleski JJ, Coull BA (2015). Bayesian Kernel Machine Regression for Estimating the Health Effects of Multi-Pollutant Mixtures. Biostatistics 16, no. 3: 493-508.
Banerjee S, Gelfand AE, Finley AO, Sang H (2008). Gaussian predictive process models for large spatial data sets. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 70(4), 825-848.
For guided examples, go to https://jenfb.github.io/bkmr/overview.html
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE)
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE)
Compare estimated h
function when all predictors are at a particular quantile to when all are at a second fixed quantile
OverallRiskSummaries( fit, y = NULL, Z = NULL, X = NULL, qs = seq(0.25, 0.75, by = 0.05), q.fixed = 0.5, method = "approx", sel = NULL )
OverallRiskSummaries( fit, y = NULL, Z = NULL, X = NULL, qs = seq(0.25, 0.75, by = 0.05), q.fixed = 0.5, method = "approx", sel = NULL )
fit |
An object containing the results returned by a the |
y |
a vector of outcome data of length |
Z |
an |
X |
an |
qs |
vector of quantiles at which to calculate the overall risk summary |
q.fixed |
a second quantile at which to compare the estimated |
method |
method for obtaining posterior summaries at a vector of new points. Options are "approx" and "exact"; defaults to "approx", which is faster particularly for large datasets; see details |
sel |
selects which iterations of the MCMC sampler to use for inference; see details |
If method == "approx"
, the argument sel
defaults to the second half of the MCMC iterations.
If method == "exact"
, the argument sel
defaults to keeping every 10 iterations after dropping the first 50% of samples, or if this results in fewer than 100 iterations, than 100 iterations are kept
For guided examples and additional information, go to https://jenfb.github.io/bkmr/overview.html
a data frame containing the (posterior mean) estimate and posterior standard deviation of the overall risk measures
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) risks.overall <- OverallRiskSummaries(fit = fitkm, qs = seq(0.25, 0.75, by = 0.05), q.fixed = 0.5, method = "exact")
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) risks.overall <- OverallRiskSummaries(fit = fitkm, qs = seq(0.25, 0.75, by = 0.05), q.fixed = 0.5, method = "exact")
Plot the estimated h(z[m])
estimated from frequentist KMR for r[m]
fixed to specific values
PlotPriorFits( y, X, Z, fits, which.z = NULL, which.q = NULL, plot.resid = TRUE, ylim = NULL, ... )
PlotPriorFits( y, X, Z, fits, which.z = NULL, which.q = NULL, plot.resid = TRUE, ylim = NULL, ... )
y |
a vector of outcome data of length |
X |
an |
Z |
an |
fits |
output from |
which.z |
which predictors (columns in |
which.q |
which q.values to plot; defaults to all possible |
plot.resid |
whether to plot the data points |
ylim |
plotting limits for the y-axis |
... |
other plotting arguments |
No return value, generates plot
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X priorfits <- InvestigatePrior(y = y, Z = Z, X = X, q.seq = c(2, 1/2, 1/4, 1/16)) PlotPriorFits(y = y, Z = Z, X = X, fits = priorfits)
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X priorfits <- InvestigatePrior(y = y, Z = Z, X = X, q.seq = c(2, 1/2, 1/4, 1/16)) PlotPriorFits(y = y, Z = Z, X = X, fits = priorfits)
Predict the exposure-response function at a new grid of points
PredictorResponseBivar( fit, y = NULL, Z = NULL, X = NULL, z.pairs = NULL, method = "approx", ngrid = 50, q.fixed = 0.5, sel = NULL, min.plot.dist = 0.5, center = TRUE, z.names = colnames(Z), verbose = TRUE, ... )
PredictorResponseBivar( fit, y = NULL, Z = NULL, X = NULL, z.pairs = NULL, method = "approx", ngrid = 50, q.fixed = 0.5, sel = NULL, min.plot.dist = 0.5, center = TRUE, z.names = colnames(Z), verbose = TRUE, ... )
fit |
An object containing the results returned by a the |
y |
a vector of outcome data of length |
Z |
an |
X |
an |
z.pairs |
data frame showing which pairs of predictors to plot |
method |
method for obtaining posterior summaries at a vector of new points. Options are "approx" and "exact"; defaults to "approx", which is faster particularly for large datasets; see details |
ngrid |
number of grid points in each dimension |
q.fixed |
vector of quantiles at which to fix the remaining predictors in |
sel |
logical expression indicating samples to keep; defaults to keeping the second half of all samples |
min.plot.dist |
specifies a minimum distance that a new grid point needs to be from an observed data point in order to compute the prediction; points further than this will not be computed |
center |
flag for whether to scale the exposure-response function to have mean zero |
z.names |
optional vector of names for the columns of |
verbose |
TRUE or FALSE: flag of whether to print intermediate output to the screen |
... |
other arguments to pass on to the prediction function |
For guided examples, go to https://jenfb.github.io/bkmr/overview.html
a long data frame with the name of the first predictor, the name of the second predictor, the value of the first predictor, the value of the second predictor, the posterior mean estimate, and the posterior standard deviation of the estimated exposure response function
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) ## Obtain predicted value on new grid of points for each pair of predictors ## Using only a 10-by-10 point grid to make example run quickly pred.resp.bivar <- PredictorResponseBivar(fit = fitkm, min.plot.dist = 1, ngrid = 10)
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) ## Obtain predicted value on new grid of points for each pair of predictors ## Using only a 10-by-10 point grid to make example run quickly pred.resp.bivar <- PredictorResponseBivar(fit = fitkm, min.plot.dist = 1, ngrid = 10)
Function to plot the h
function of a particular variable at different levels (quantiles) of a second variable
PredictorResponseBivarLevels( pred.resp.df, Z = NULL, qs = c(0.25, 0.5, 0.75), both_pairs = TRUE, z.names = NULL )
PredictorResponseBivarLevels( pred.resp.df, Z = NULL, qs = c(0.25, 0.5, 0.75), both_pairs = TRUE, z.names = NULL )
pred.resp.df |
object obtained from running the function |
Z |
an |
qs |
vector of quantiles at which to fix the second variable |
both_pairs |
flag indicating whether, if |
z.names |
optional vector of names for the columns of |
For guided examples, go to https://jenfb.github.io/bkmr/overview.html
a long data frame with the name of the first predictor, the name of the second predictor, the value of the first predictor, the quantile at which the second predictor is fixed, the posterior mean estimate, and the posterior standard deviation of the estimated exposure response function
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) ## Obtain predicted value on new grid of points for each pair of predictors ## Using only a 10-by-10 point grid to make example run quickly pred.resp.bivar <- PredictorResponseBivar(fit = fitkm, min.plot.dist = 1, ngrid = 10) pred.resp.bivar.levels <- PredictorResponseBivarLevels(pred.resp.df = pred.resp.bivar, Z = Z, qs = c(0.1, 0.5, 0.9))
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) ## Obtain predicted value on new grid of points for each pair of predictors ## Using only a 10-by-10 point grid to make example run quickly pred.resp.bivar <- PredictorResponseBivar(fit = fitkm, min.plot.dist = 1, ngrid = 10) pred.resp.bivar.levels <- PredictorResponseBivarLevels(pred.resp.df = pred.resp.bivar, Z = Z, qs = c(0.1, 0.5, 0.9))
Plot bivariate predictor-response function on a new grid of points
PredictorResponseBivarPair( fit, y = NULL, Z = NULL, X = NULL, whichz1 = 1, whichz2 = 2, whichz3 = NULL, method = "approx", prob = 0.5, q.fixed = 0.5, sel = NULL, ngrid = 50, min.plot.dist = 0.5, center = TRUE, ... )
PredictorResponseBivarPair( fit, y = NULL, Z = NULL, X = NULL, whichz1 = 1, whichz2 = 2, whichz3 = NULL, method = "approx", prob = 0.5, q.fixed = 0.5, sel = NULL, ngrid = 50, min.plot.dist = 0.5, center = TRUE, ... )
fit |
An object containing the results returned by a the |
y |
a vector of outcome data of length |
Z |
an |
X |
an |
whichz1 |
vector identifying the first predictor that (column of |
whichz2 |
vector identifying the second predictor that (column of |
whichz3 |
vector identifying the third predictor that will be set to a pre-specified fixed quantile (determined by |
method |
method for obtaining posterior summaries at a vector of new points. Options are "approx" and "exact"; defaults to "approx", which is faster particularly for large datasets; see details |
prob |
pre-specified quantile to set the third predictor (determined by |
q.fixed |
vector of quantiles at which to fix the remaining predictors in |
sel |
logical expression indicating samples to keep; defaults to keeping the second half of all samples |
ngrid |
number of grid points to cover the range of each predictor (column in |
min.plot.dist |
specifies a minimum distance that a new grid point needs to be from an observed data point in order to compute the prediction; points further than this will not be computed |
center |
flag for whether to scale the exposure-response function to have mean zero |
... |
other arguments to pass on to the prediction function |
a data frame with value of the first predictor, the value of the second predictor, the posterior mean estimate, and the posterior standard deviation
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) ## Obtain predicted value on new grid of points ## Using only a 10-by-10 point grid to make example run quickly pred.resp.bivar12 <- PredictorResponseBivarPair(fit = fitkm, min.plot.dist = 1, ngrid = 10)
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) ## Obtain predicted value on new grid of points ## Using only a 10-by-10 point grid to make example run quickly pred.resp.bivar12 <- PredictorResponseBivarPair(fit = fitkm, min.plot.dist = 1, ngrid = 10)
Plot univariate predictor-response function on a new grid of points
PredictorResponseUnivar( fit, y = NULL, Z = NULL, X = NULL, which.z = 1:ncol(Z), method = "approx", ngrid = 50, q.fixed = 0.5, sel = NULL, min.plot.dist = Inf, center = TRUE, z.names = colnames(Z), ... )
PredictorResponseUnivar( fit, y = NULL, Z = NULL, X = NULL, which.z = 1:ncol(Z), method = "approx", ngrid = 50, q.fixed = 0.5, sel = NULL, min.plot.dist = Inf, center = TRUE, z.names = colnames(Z), ... )
fit |
An object containing the results returned by a the |
y |
a vector of outcome data of length |
Z |
an |
X |
an |
which.z |
vector identifying which predictors (columns of |
method |
method for obtaining posterior summaries at a vector of new points. Options are "approx" and "exact"; defaults to "approx", which is faster particularly for large datasets; see details |
ngrid |
number of grid points to cover the range of each predictor (column in |
q.fixed |
vector of quantiles at which to fix the remaining predictors in |
sel |
logical expression indicating samples to keep; defaults to keeping the second half of all samples |
min.plot.dist |
specifies a minimum distance that a new grid point needs to be from an observed data point in order to compute the prediction; points further than this will not be computed |
center |
flag for whether to scale the exposure-response function to have mean zero |
z.names |
optional vector of names for the columns of |
... |
other arguments to pass on to the prediction function |
For guided examples, go to https://jenfb.github.io/bkmr/overview.html
a long data frame with the predictor name, predictor value, posterior mean estimate, and posterior standard deviation
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) pred.resp.univar <- PredictorResponseUnivar(fit = fitkm)
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) pred.resp.univar <- PredictorResponseUnivar(fit = fitkm)
print
method for class "bkmrfit"
## S3 method for class 'bkmrfit' print(x, digits = 5, ...)
## S3 method for class 'bkmrfit' print(x, digits = 5, ...)
x |
an object of class "bkmrfit" |
digits |
the number of digits to show when printing |
... |
further arguments passed to or from other methods. |
No return value, prints basic summary of fit to console
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) fitkm
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) fitkm
Obtains posterior samples of E(Y) = h(Znew) + beta*Xnew
or of g^{-1}[E(y)]
SamplePred( fit, Znew = NULL, Xnew = NULL, Z = NULL, X = NULL, y = NULL, sel = NULL, type = c("link", "response"), ... )
SamplePred( fit, Znew = NULL, Xnew = NULL, Z = NULL, X = NULL, y = NULL, sel = NULL, type = c("link", "response"), ... )
fit |
An object containing the results returned by a the |
Znew |
optional matrix of new predictor values at which to predict new |
Xnew |
optional matrix of new covariate values at which to obtain predictions. If not specified, defaults to using observed X values |
Z |
an |
X |
an |
y |
a vector of outcome data of length |
sel |
A vector selecting which iterations of the BKMR fit should be retained for inference. If not specified, will default to keeping every 10 iterations after dropping the first 50% of samples, or if this results in fewer than 100 iterations, than 100 iterations are kept |
type |
whether to make predictions on the scale of the link or of the response; only relevant for the binomial outcome family |
... |
other arguments; not currently used |
For guided examples, go to https://jenfb.github.io/bkmr/overview.html
a matrix with the posterior samples at the new points
set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) med_vals <- apply(Z, 2, median) Znew <- matrix(med_vals, nrow = 1) h_true <- dat$HFun(Znew) set.seed(111) samps3 <- SamplePred(fitkm, Znew = Znew, Xnew = cbind(0)) head(samps3)
set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) med_vals <- apply(Z, 2, median) Znew <- matrix(med_vals, nrow = 1) h_true <- dat$HFun(Znew) set.seed(111) samps3 <- SamplePred(fitkm, Znew = Znew, Xnew = cbind(0)) head(samps3)
Simulate predictor, covariate, and continuous outcome data
SimData( n = 100, M = 5, sigsq.true = 0.5, beta.true = 2, hfun = 3, Zgen = "norm", ind = 1:2, family = "gaussian" )
SimData( n = 100, M = 5, sigsq.true = 0.5, beta.true = 2, hfun = 3, Zgen = "norm", ind = 1:2, family = "gaussian" )
n |
Number of observations |
M |
Number of predictor variables to generate |
sigsq.true |
Variance of normally distributed residual error |
beta.true |
Coefficient on the covariate |
hfun |
An integer from 1 to 3 identifying which predictor-response function to generate |
Zgen |
Method for generating the matrix Z of exposure variables, taking one of the values c("unif", "norm", "corr", "realistic") |
ind |
select which predictor(s) will be included in the |
family |
a description of the error distribution and link function to be used in the model. Currently implemented for |
hfun = 1
: A nonlinear function of the first predictor
hfun = 2
: A linear function of the first two predictors and their product term
hfun = 3
: A nonlinear and nonadditive function of the first two predictor variables
a list containing the parameter values and generated variables of the simulated datasets
set.seed(5) dat <- SimData()
set.seed(5) dat <- SimData()
Compare the single-predictor health risks when all of the other predictors in Z are fixed to their a specific quantile to when all of the other predictors in Z are fixed to their a second specific quantile.
SingVarIntSummaries( fit, y = NULL, Z = NULL, X = NULL, which.z = 1:ncol(Z), qs.diff = c(0.25, 0.75), qs.fixed = c(0.25, 0.75), method = "approx", sel = NULL, z.names = colnames(Z), ... )
SingVarIntSummaries( fit, y = NULL, Z = NULL, X = NULL, which.z = 1:ncol(Z), qs.diff = c(0.25, 0.75), qs.fixed = c(0.25, 0.75), method = "approx", sel = NULL, z.names = colnames(Z), ... )
fit |
An object containing the results returned by a the |
y |
a vector of outcome data of length |
Z |
an |
X |
an |
which.z |
vector indicating which variables (columns of |
qs.diff |
vector indicating the two quantiles at which to compute the single-predictor risk summary |
qs.fixed |
vector indicating the two quantiles at which to fix all of the remaining exposures in |
method |
method for obtaining posterior summaries at a vector of new points. Options are "approx" and "exact"; defaults to "approx", which is faster particularly for large datasets; see details |
sel |
logical expression indicating samples to keep; defaults to keeping the second half of all samples |
z.names |
optional vector of names for the columns of |
... |
other arguments to pass on to the prediction function |
If method == "approx"
, the argument sel
defaults to the second half of the MCMC iterations.
If method == "exact"
, the argument sel
defaults to keeping every 10 iterations after dropping the first 50% of samples, or if this results in fewer than 100 iterations, than 100 iterations are kept
For guided examples and additional information, go to https://jenfb.github.io/bkmr/overview.html
a data frame containing the (posterior mean) estimate and posterior standard deviation of the single-predictor risk measures
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) risks.int <- SingVarIntSummaries(fit = fitkm, method = "exact")
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) risks.int <- SingVarIntSummaries(fit = fitkm, method = "exact")
Compute summaries of the risks associated with a change in a single variable in Z
from a single level (quantile) to a second level (quantile), for the other variables in Z
fixed to a specific level (quantile)
SingVarRiskSummaries( fit, y = NULL, Z = NULL, X = NULL, which.z = 1:ncol(Z), qs.diff = c(0.25, 0.75), q.fixed = c(0.25, 0.5, 0.75), method = "approx", sel = NULL, z.names = colnames(Z), ... )
SingVarRiskSummaries( fit, y = NULL, Z = NULL, X = NULL, which.z = 1:ncol(Z), qs.diff = c(0.25, 0.75), q.fixed = c(0.25, 0.5, 0.75), method = "approx", sel = NULL, z.names = colnames(Z), ... )
fit |
An object containing the results returned by a the |
y |
a vector of outcome data of length |
Z |
an |
X |
an |
which.z |
vector indicating which variables (columns of |
qs.diff |
vector indicating the two quantiles |
q.fixed |
vector of quantiles at which to fix the remaining predictors in |
method |
method for obtaining posterior summaries at a vector of new points. Options are "approx" and "exact"; defaults to "approx", which is faster particularly for large datasets; see details |
sel |
logical expression indicating samples to keep; defaults to keeping the second half of all samples |
z.names |
optional vector of names for the columns of |
... |
other arguments to pass on to the prediction function |
If method == "approx"
, the argument sel
defaults to the second half of the MCMC iterations.
If method == "exact"
, the argument sel
defaults to keeping every 10 iterations after dropping the first 50% of samples, or if this results in fewer than 100 iterations, than 100 iterations are kept
For guided examples and additional information, go to https://jenfb.github.io/bkmr/overview.html
a data frame containing the (posterior mean) estimate and posterior standard deviation of the single-predictor risk measures
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) risks.singvar <- SingVarRiskSummaries(fit = fitkm, method = "exact")
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) risks.singvar <- SingVarRiskSummaries(fit = fitkm, method = "exact")
summary
method for class "bkmrfit"
## S3 method for class 'bkmrfit' summary( object, q = c(0.025, 0.975), digits = 5, show_ests = TRUE, show_MH = TRUE, ... )
## S3 method for class 'bkmrfit' summary( object, q = c(0.025, 0.975), digits = 5, show_ests = TRUE, show_MH = TRUE, ... )
object |
an object of class "bkmrfit" |
q |
quantiles of posterior distribution to show |
digits |
the number of digits to show when printing |
show_ests |
logical; if |
show_MH |
logical; if |
... |
further arguments passed to or from other methods. |
No return value, prints more detailed summary of fit to console
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) summary(fitkm)
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) summary(fitkm)
Trace plot
TracePlot( fit, par, comp = 1, sel = NULL, main = "", xlab = "iteration", ylab = "parameter value", ... )
TracePlot( fit, par, comp = 1, sel = NULL, main = "", xlab = "iteration", ylab = "parameter value", ... )
fit |
An object containing the results returned by a the |
par |
which parameter to plot |
comp |
which component of the parameter vector to plot |
sel |
logical expression indicating samples to keep; defaults to keeping the second half of all samples |
main |
title |
xlab |
x axis label |
ylab |
y axis label |
... |
other arguments to pass onto the plotting function |
For guided examples, go to https://jenfb.github.io/bkmr/overview.html
No return value, generates plot
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) TracePlot(fit = fitkm, par = "beta") TracePlot(fit = fitkm, par = "sigsq.eps") TracePlot(fit = fitkm, par = "r", comp = 1)
## First generate dataset set.seed(111) dat <- SimData(n = 50, M = 4) y <- dat$y Z <- dat$Z X <- dat$X ## Fit model with component-wise variable selection ## Using only 100 iterations to make example run quickly ## Typically should use a large number of iterations for inference set.seed(111) fitkm <- kmbayes(y = y, Z = Z, X = X, iter = 100, verbose = FALSE, varsel = TRUE) TracePlot(fit = fitkm, par = "beta") TracePlot(fit = fitkm, par = "sigsq.eps") TracePlot(fit = fitkm, par = "r", comp = 1)