The uddbart package provides tools for dynamic risk
prediction from irregular longitudinal biomarker data with
interval-censored outcomes.
The package is designed for studies where patients are followed over time, biomarker measurements are collected at irregular visit times, and the clinical event is known only to occur between two observation times.
A motivating example is chronic myeloid leukemia (CML), where patients are monitored using repeated BCR–ABL measurements and the event of interest is deep molecular response.
The development version can be installed from GitHub:
The package includes two example datasets:
The longitudinal dataset contains repeated biomarker measurements:
head(cml_long)
#> patient_id t_months log_mrd
#> 1 P0001 2.004107 -1.3
#> 2 P0001 11.498973 -0.8
#> 3 P0001 17.347023 -1.3
#> 4 P0001 22.636550 -1.2
#> 5 P0001 32.065708 -1.0
#> 6 P0001 53.059548 1.0The event dataset contains interval-censored outcome information:
The longitudinal biomarker data should contain one row per patient visit. A typical structure is:
Required columns are usually:
patient_id: patient identifiert_months: visit timelog_mrd: longitudinal biomarker valueThe event data should contain one row per patient:
Required columns are usually:
patient_id: patient identifierL: left endpoint of the event intervalR: right endpoint of the event intervalC: censoring timedelta: event indicatorThe following example demonstrates the basic workflow.
For CRAN checking, the full model fit is not evaluated in this vignette because Bayesian tree fitting can take time.
After fitting a model, predicted risks can be obtained using
predict().
The predicted values represent individualized probabilities of experiencing the event within the specified prediction horizon after each landmark time.
A fitted uddbart object typically contains:
Common components include:
For a landmark time \(s\) and
prediction horizon \(\Delta\),
uddbart estimates:
\[ P(T \le s + \Delta \mid T > s, \mathcal{H}(s)), \]
where \(T\) is the event time and \(\mathcal{H}(s)\) is the longitudinal biomarker history observed before or at time \(s\).
In the CML example, this can be interpreted as:
the probability that a patient will achieve deep molecular response within the next prediction window, given their observed BCR–ABL monitoring history up to the landmark time.
The computationally intensive examples are wrapped in
eval=FALSE so that the vignette can be built quickly during
CRAN checks.
Users can copy and run these examples interactively after installing all required dependencies.
Chipman, H. A., George, E. I., and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298.
Rizopoulos, D. (2011). Dynamic predictions and prospective accuracy in joint models for longitudinal and time-to-event data. Biometrics, 67(3), 819–829.
van Houwelingen, H. C., and Putter, H. (2012). Dynamic Prediction in Clinical Survival Analysis. CRC Press.