---
title: "Unanchored Survival Analysis"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
bibliography: references.bib
csl: biomedicine.csl
vignette: >
  %\VignetteIndexEntry{Unanchored Survival Analysis}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(warning = FALSE, message = FALSE)
```

# Loading R packages

```{r}
# install.packages("maicplus")
library(maicplus)
```

Additional R packages for this vignette:

```{r}
library(dplyr)
```

# Illustration using example data

This example reads in `centered_ipd_sat` data that was created in `calculating_weights` vignette and uses `adtte_sat` and `pseudo_ipd_sat` outcome datasets to run survival analysis using the `maic_unanchored` function by specifying `endpoint_type = "tte"`.

Note that parameters `ipd` and `pseudo_ipd` in the `maic_unanchored` function for survival data analysis needs to have the following columns: USUBJID, ARM, EVENT, and TIME. USUBJID in `ipd` needs to match USUBJID in `weights_object`. USUBJID in `pseudo_ipd` is not strictly required, as if left unspecified, subject numbers are assigned using the row indexes.

Furthermore, TIME in both these datasets have to be in days. For instance, when we digitize a KM plot where time is recorded in months, we need to multiply the months by the appropriate factor (i.e. 30.4375) to get the time in days.

If you would like to run an analysis using scaled weights, set `normalize_weight` to TRUE. One clear benefit of using scaled weights is that the risk table at time = 0 in the Kaplan-Meier curve will match the IPD sample size.

```{r}
data(centered_ipd_sat)
data(adtte_sat)
data(pseudo_ipd_sat)

centered_colnames <- c("AGE", "AGE_SQUARED", "SEX_MALE", "ECOG0", "SMOKE", "N_PR_THER_MEDIAN")
centered_colnames <- paste0(centered_colnames, "_CENTERED")

weighted_data <- estimate_weights(
  data = centered_ipd_sat,
  centered_colnames = centered_colnames
)

result <- maic_unanchored(
  weights_object = weighted_data,
  ipd = adtte_sat,
  pseudo_ipd = pseudo_ipd_sat,
  trt_ipd = "A",
  trt_agd = "B",
  normalize_weight = FALSE,
  endpoint_name = "Overall Survival",
  endpoint_type = "tte",
  eff_measure = "HR",
  time_scale = "month",
  km_conf_type = "log-log"
)
```

There are two summaries available in the result: descriptive and inferential. In the descriptive section, we have summaries from fitting `survfit` function. Note that restricted mean (rmean), median, and 95% CI are summarized in the `time_scale` specified.

```{r}
result$descriptive$summary

# Not shown due to long output
# result$descriptive$survfit_before
# result$descriptive$survfit_after
```

In the inferential section, we have the fitted models stored (i.e. `survfit` and `coxph`) and the results from the `coxph` models (i.e. hazard ratios and CI). Note that the p-values summarized are from `coxph` model Wald test and not from a log-rank test. Here is the overall summary.

```{r}
result$inferential$summary
```

Here are models and results before adjustment.

```{r}
result$inferential$fit$km_before
result$inferential$fit$model_before
result$inferential$fit$res_AB_unadj
```

Here are models and results after adjustment.

```{r}
result$inferential$fit$km_after
result$inferential$fit$model_after
result$inferential$fit$res_AB
```

```{r, eval = FALSE, echo = FALSE}
# heuristic check
# merge in adtte with ipd_matched
ipd_matched <- weighted_data$data
combined_data_tte <- ipd_matched %>%
  left_join(adtte_sat, by = c("USUBJID", "ARM"))
pseudo_ipd <- pseudo_ipd_sat
pseudo_ipd$weights <- 1

combined_data_tte <- rbind(
  combined_data_tte[, colnames(pseudo_ipd)],
  pseudo_ipd
)

# Change the reference treatment to B
combined_data_tte$ARM <- stats::relevel(as.factor(combined_data_tte$ARM), ref = "B")

# Fit a Cox model with/without weights to estimate the HR
unweighted_cox <- coxph(Surv(TIME, EVENT == 1) ~ ARM, data = combined_data_tte)
weighted_cox <- coxph(Surv(TIME, EVENT == 1) ~ ARM,
  data = combined_data_tte, weights = combined_data_tte$weights, robust = TRUE
)

unweighted_cox
weighted_cox

kmobj <- survfit(Surv(TIME, EVENT) ~ ARM, combined_data_tte, conf.type = "log-log")
kmobj_adj <- survfit(Surv(TIME, EVENT) ~ ARM,
  combined_data_tte,
  weights = combined_data_tte$weights, conf.type = "log-log"
)

# Derive median survival time
medSurv <- medSurv_makeup(kmobj, legend = "before matching", time_scale = "day")
medSurv_adj <- medSurv_makeup(kmobj_adj, legend = "after matching", time_scale = "day")
medSurv_out <- rbind(medSurv, medSurv_adj)
medSurv_out
```

# Using bootstrap to calculate standard errors

If bootstrap standard errors are preferred, we need to specify the number of bootstrap iteration (`n_boot_iteration`) in `estimate_weights` function and proceed fitting `maic_unanchored` function. Then, the outputs include bootstrapped CI. Different types of bootstrap CI can be found by specifying the parameter `boot_ci_type`. See `boot.ci` in `boot` package for more details.

```{r}
weighted_data2 <- estimate_weights(
  data = centered_ipd_sat,
  centered_colnames = centered_colnames,
  n_boot_iteration = 100,
  set_seed_boot = 1234
)

result_boot <- maic_unanchored(
  weights_object = weighted_data2,
  ipd = adtte_sat,
  pseudo_ipd = pseudo_ipd_sat,
  trt_ipd = "A",
  trt_agd = "B",
  normalize_weight = FALSE,
  endpoint_name = "Overall Survival",
  endpoint_type = "tte",
  eff_measure = "HR",
  boot_ci_type = "perc",
  time_scale = "month",
  km_conf_type = "log-log"
)

result_boot$inferential$fit$boot_res_AB
```