--- title: "Survcompare_application" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Survcompare_application} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", rf_user <- rfcores_old <- options()$rf.cores, warnings_user<- options()$warn, scipen_user<- options()$scipen, digits_user<- options()$digits, options(rf.cores = 1), options(warn = -1), options(scipen = 100, digits = 4) ) ``` ## Package background The package checks whether there are considerable non-linear and interaction terms in the data, and quantifies their contributions to the models' performance. Using repeated nested cross-validation (that is, a system of random data splits into the training and testing sets to evaluate prediction accuracy), the package: - Validates Cox Proportionate Hazards model, or Cox Lasso depending on the user's input. This step employs - `survival::coxph()` (Therneau, 2023); - `glmnet::glmnet(..., family="cox")` (Simon, Friedman, Hastie & Tibshirani , 2011). - Validates Survival Random Forest ensembled with the baseline Cox model. To fit the ensemble, CoxPH's out-of-sample predictions are added to the list of orignial predictors, which are then used to fit SRF, using - `randomForestSRC::rfsrc()`(Ishwaran & Kogalur, 2023). - Validates stacked ensemble of the CoxPH and Survival Random Forest, which predicts as a linear combination of the CoxPH and SRF, (1-lambda) x CoxPH + lambda x SRF. Lambda parameter is tuned within the package and the value can indicate if taking a share of SRF predictions could improve the baseline CoxPH's performance. Lambda = 0 means only CoxPH is used, lambda = 1 means the model is the same as SRF. - GitHit only: deep learning model DeepHit 'survivalmodels::deephit()', as well as its sequential and stacked ensembles with CoxPH or Cox-Lasso. - Performs statistical testing of whether the Survival Random Forest [ensemble] outperforms the Cox model. - Does NOT handle missing data at the moment. Predictive performance is evaluated by averaging different measures across all train-test splits. The following measures are computed: - Discrimination measures: Harrell's concordance index, time-dependent AUCROC. - Calibration measures: calibration slope, calibration alpha. - Overall fit: Brier score, Scaled Brier score. Performance metrics' description and definitions can be found, for example, in Steyerberg & Vergouwe (2014). ##### References: Therneau T (2023). *A Package for Survival Analysis in R*. R package version 3.5-7, . Simon N, Friedman J, Tibshirani R, Hastie T (2011). "Regularization Paths for Cox's Proportional Hazards Model via Coordinate Descent." *Journal of Statistical Software, 39(5)*, 1--13. . Ishwaran H, Kogalur U (2023). *Fast Unified Random Forests for Survival, Regression, and Classification (RF-SRC).* R package version 3.2.2, Steyerberg EW, Vergouwe Y. (2014). Towards better clinical prediction models: seven steps for development and an ABCD for validation. *European heart journal, 35(29)*, 1925-1931 ### What can be inferred from the `survcompare` results? There are two possible outcomes: 1) "Survival Random Forest ensemble has NOT outperformed CoxPH", or 2) "Survival Random Forest ensemble has outperformed CoxPH by ... in C-index". 1) If there is *no outperformance*, the test can justify the employment of CoxPH model and indicate a negligible advantage of using a more flexible model such as Survival Random Forest. 2) In the case of the ensemble's *outperformance* of the Cox model, a researcher can: - Employ a more complex or flexible model. - Look for the interaction and non-linear terms that could be added to the CoxPH and re-run the test. - Consider using the CoxPH model despite its underperformance if the difference is not large enough to sacrifice model's interpretability, or negligible in the context of a performed task. Before interpreting the results, ensure that a sufficient number of repetitions (repeat_cv) has been used. Parameter repeat_cv should be at least 5, but for a heterogeneous data 20-50 may be needed. Otherwise, the results may unreliable and subject to chance. ### Why the CoxPH-SRF ensemble and not just SRF? First, you can use the package to compare the performances of the CoxPH and SRF themselves. Second, the ensembles aim to single out the predictive value of the non-linearity and other data relationships that could not be captured by the baseline models. In both ensembles, the final models has a direct access to the predictions of the baseline CoxPH, and hence, the outperformance can be fully attributed to such complex relationships. For example, the sequential ensemble of Cox and SRF takes the predictions of the Cox model and adds to the list of predictors to train SRF. This way, we make sure that linearity is captured by SRF at least as good as in the Cox model, and hence the marginal outperformance of the ensemble over the Cox model can be fully attributed to the qualities of SRF that Cox does not have, that is, data complexity. ### Package installation ```{r setup} # For the main version: # install.packages("devtools") # devtools::install_github("dianashams/survcompare") # DeepHit Extension: # devtools::install_github("dianashams/survcompare", ref = "DeepHit") library(survcompare) ``` ## Examples ### Example 1. Linear data The first example takes simulated data that does not contain any complex terms, and CoxPH is expected to perform as good as Survival Random Forest. ```{r example_simulated_1} mydata <- simulate_linear(200) mypredictors <- names(mydata)[1:4] survcompare(mydata, mypredictors, randomseed = 100) ``` NOTE: the same results can be obtained by cross-validating CoxPH and SRF Ensemble separately,and then using survcompare2() function; outer_cv, inner_cv, repeat_cv, and randomseed should be the same. ```{r example_simulated_alternative, eval = FALSE} #cross-validate CoxPH cv1<- survcox_cv(mydata, mypredictors, randomseed = 100) #cross-validate Survival Random Forests ensemble cv2<- survsrfens_cv(mydata, mypredictors, randomseed = 100) #compare the models survcompare2(cv1, cv2) ``` ### Example 2. Non-linear data with interaction terms The second example takes simulated data that contains non-linear and cross-terms, and an outperformance of the tree model is expected. We will increase the default number of cross-validation repetitions to get a robust estimate, choose CoxLasso as our baseline linear model, and SRF as the alternative (not the ensemble). ```{r example_simulated_2} mydata2 <- simulate_crossterms(200) mypredictors2 <- names(mydata2)[1:4] #cross-validate CoxPH cv1 <- survcox_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3, useCoxLasso = TRUE) # cross-validate Survival Random Forests ensemble cv2 <- survsrf_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3) # compare the models compare_nonl <- survcompare2(cv1, cv2) compare_nonl ``` Detailed results can be extracted from the output object. For example, test performance for each data split (across all cross-validations and repetitions). ```{r} # Main stats compare_nonl$main_stats_pooled # More information, including Brier Scores, Calibration slope and alphas can be seen in the # compare_models_1$results_mean. These are averaged performance metrics on the test sets. compare_nonl$results_mean ``` We can also evaluate the stacked ensemble of Cox-PH and SRF and see the tuned lambda and SRF model parameters: ```{r example_simulated_alternative2} #cross-validate CoxPH cv3<- survsrfstack_cv(mydata2, mypredictors2, randomseed = 100, repeat_cv = 3) survcompare2(cv1, cv3) ``` Display all tuned model params. Lambda shows relative share of the SRF in the linear meta-learner. Lambda = 0 means that the model only uses CoxPH. Lambda = 1 means that only SRF is used. ```{r example_simulated_alternative3} # all tuned params cv3$bestparams lambdas <- unlist(cv3$bestparams$lambda) print("Mean lambda in the stacked ensemble of the SRF and CoxPH, indicating the share of SRF predictions in the meta-learner:") print(mean(lambdas)) plot(lambdas) abline(h=mean(lambdas), col = "blue") ``` ### Example 3. Applying `survcompare` to GBSG data Now, lets apply the package to a real life data. We will use GBSG data from the survival package (). ```{r example_3_gbsg, eval = FALSE} library(survival) #prepare the data mygbsg <- gbsg mygbsg$time <- gbsg$rfstime / 365 mygbsg$event <- gbsg$status myfactors <- c("age", "meno", "size", "grade", "nodes", "pgr", "er", "hormon") mygbsg <- mygbsg[c("time", "event", myfactors)] sum(is.na(mygbsg[myfactors])) #check if any missing data # run survcompare cv1<- survcox_cv(mygbsg, myfactors, randomseed = 42, repeat_cv = 3) cv2<- survsrfens_cv(mygbsg, myfactors, randomseed = 42, repeat_cv = 3) survcompare2(cv1, cv2) ``` We got the following results (depending on a random seed, it may slightly differ): ``` Internally validated test performance of CoxPH and SRF_ensemble over 3 repeated 3 fold cross-validations (inner k = 3 ). Mean performance: Median performance: SRF_ensemble has outperformed CoxPHby 0.0137 in C-index. The difference is statistically significant with the p-value 0.012*. The supplied data may contain non-linear or cross-term dependencies, better captured by SRF_ensemble. Mean C-score: CoxPH 0.6773(95CI=0.675-0.6801;SD=0.0028) SRF_ensemble 0.6911(95CI=0.6902-0.6918;SD=9e-04) Mean AUCROC: CoxPH 0.7224(95CI=0.7176-0.728;SD=0.0055) SRF_ensemble 0.7211(95CI=0.7177-0.7241;SD=0.0034) ``` This example illustrates a situation when outperformance of the non-linear model is statistically significant, but not large in absolute terms. The ultimate model choice will depend on how critical such an improvement is for the model stakeholders. ```{r, include = FALSE} # reinstating the value for rf.cores back to default/user-defined value options(rf.cores = rfcores_old) options(warn = warnings_user) options(digits = digits_user) options(scipen = scipen_user) ```