Vignette: Bayesian Kernel Machine Regression (BKMR) for Metal Exposure Analysis with Dynamic Threshold Function
Vignette: Bayesian Kernel Machine Regression (BKMR) for Metal Exposure Analysis with Dynamic Threshold Function
Kazi Tanvir Hasan and Dr. Gabriel Odom
Introduction
In this vignette, we demonstrate how to use the bkmr package in R to analyze the relationship between metal exposures (e.g., Cadmium, Mercury, Arsenic, Lead, and Manganese) and a response variable (e.g., IQ score or QI). Additionally, we will compute a dynamic threshold for model inclusion based on the coefficient of variation and the sample size.
1. Required Libraries
Before starting, ensure the following libraries are installed and loaded:
# Load the necessary librarieslibrary(bkmr)
For guided examples, go to 'https://jenfb.github.io/bkmr/overview.html'
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
2. Data Preprocessing
First, we load the data, clean it, and perform any necessary transformations. In this case, we take the log-transformation of the metal concentrations to ensure they are on a comparable scale.
# Set the seed for reproducibilityset.seed(2025)# Load the dataset and preprocessdata("metalExposChildren_df")bkmrAnalysisData_df <- metalExposChildren_df %>%select(QI, Cadmium:Manganese) %>%# Selecting relevant columnsna.omit() %>%# Remove rows with missing valuesmutate_at(vars(Cadmium:Manganese), ~log10(. +1) %>% as.vector ) # Log-transform metal concentrations
Here, mutate_at is used to log-transform the exposure variables (Cadmium to Manganese) by applying log10(. + 1).
3. Exploratory Data Analysis (EDA)
Before fitting the model, we perform some exploratory data analysis. We visualize the distribution of metal concentrations using histograms and density plots.
# Create a histogram with density plot for each metalmetalDataPlot <- bkmrAnalysisData_df %>%select(Cadmium:Manganese) %>%pivot_longer(cols =everything(), names_to ="Metal", values_to ="Value") %>%ggplot(aes(x = Value, fill = Metal)) +geom_histogram(aes(y = ..density..), bins =30, alpha =0.5, position ="identity") +geom_density(aes(color = Metal), linewidth =1) +facet_wrap(~Metal, scales ="free") +theme_minimal() +labs(title ="Histogram with Density Plot for Metals", x ="Concentration", y ="Density") +theme(legend.position ="none")# Display the plotmetalDataPlot
Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
ℹ Please use `after_stat(density)` instead.
This plot provides a visual understanding of how the metal concentrations are distributed.
4. Model Setup
We now set up the Bayesian Kernel Machine Regression (BKMR) model by creating a response variable (QI) and the exposure matrix, which includes all the metal concentrations.
# Extract the response variable (IQ score or QI)iq <- bkmrAnalysisData_df %>%pull(QI) %>%na.omit()# Convert exposure variables (metals) to a matrix for modelingexpos <-data.matrix(bkmrAnalysisData_df %>%select(Cadmium:Manganese))
5. Generating Knot Points for the Model
To fit the Bayesian Kernel Machine Regression model, we generate knot points using the cover.design function. These points will be used for the MCMC model.
# Generate knot points using a cover design for Bayesian modelingknots50 <- fields::cover.design(expos, nd =50)$design
Here, we specify 50 knots for the cover design. These knot points help in fitting the non-linear kernel in BKMR.
6. Fitting the BKMR Model
We now fit the BKMR model using the kmbayes() function. This function performs Bayesian regression using MCMC (Markov Chain Monte Carlo) to estimate the relationship between the exposures and the response variable.
# Fit the BKMR model using MCMCmodelFit <-kmbayes(y = iq, # Response variableZ = expos, # Exposure matrix (metal concentrations)X =NULL, # No additional covariatesiter =1000, # Number of MCMC iterationsfamily ="gaussian", # Gaussian responseverbose =TRUE, # Output progress for each iterationvarsel =TRUE, # Perform variable selectionknots = knots50 # Knot points generated earlier)
After the model is fitted, we extract the posterior inclusion probabilities (PIPs) to determine which exposures are most important in predicting the response variable.
# Extract posterior inclusion probabilities (PIPs) and sort thempipFit <-ExtractPIPs(modelFit) %>%arrange(desc(PIP))# Display the PIPspipFit
The ExtractPIPs() function returns the posterior inclusion probabilities, which are then sorted in descending order to identify the most important exposures.
Dynamic Threshold Calculation Next, we calculate the dynamic threshold for model inclusion. This threshold is based on the coefficient of variation (CV) of the response variable (QI) and the sample size.
# Calculate the dynamic threshold for model inclusionpipThresh_fn <-calculate_pip_threshold(absCV =sd(bkmrAnalysisData_df$QI) /mean(bkmrAnalysisData_df$QI), # Coefficient of variationsampSize =length(bkmrAnalysisData_df$QI) # Sample size)# Display the thresholdpipThresh_fn
[1] 0.1196481
This function computes the threshold for the PIPs using the coefficient of variation of the response variable and the sample size.
9. Interpretation of Results
Finally, we compare the PIPs with the threshold to identify the variables (metals) that have a significant effect on the response variable.
# Identify exposures with PIPs greater than the thresholdsignificant_exposures <- pipFit %>%filter(PIP > pipThresh_fn)# Display significant exposuressignificant_exposures
This step filters the exposures whose posterior inclusion probabilities exceed the dynamic threshold, indicating that these variables are significant predictors of the outcome.
Conclusion
This vignette demonstrates the complete workflow for using Bayesian Kernel Machine Regression (BKMR) to analyze the relationship between various metal exposures and an outcome of interest (e.g., IQ score). Additionally, we computed a dynamic threshold for model inclusion, which adjusts the threshold based on the coefficient of variation and the sample size.
Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.