---
title: "ComBatQC"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{ComBatFamQC_harmonization}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

# Introduction

The need for larger samples in human subjects research have led to a trend of aggregating data across multiple locations (sites). This trend is especially prevalent in neuroimaging research. However, while the larger samples promoted greater power to detect significant associations as well as better generalizability of results, multiple-site study designs also introduce heterogeneity in acquisition and processing, which might consequently impact study findings.

ComBat is used as the major harmonization technique in neuroimaging and the ComBat Family further extends the original ComBat methodology to enable flexible covariate modeling, leveraging efficient R implementations of regression models. However, it remains a challenge to evaluate potential batch effects as well as the performance of harmonization. `ComBatFamQC` provides a useful visualization tool through Rshiny for interactive batch effect diagnostics before and after harmonization. To streamline the harmonization process and improve efficiency, transparency, and reproducibility, `ComBatFamQC` also provides default harmonization methods by integrating the `ComBatFamily` package.

The `ComBatFamQC` visualization includes three key functions:

-   `visual_prep`: provides all relevant statistical test results for batch effect visualization and evaluation.
-   `combat_harm`: provides default harmonization methods from the ComBatFamily package.
-   `comfam_shiny`: generate interactive visualization through Rshiny.

The `ComBatFamQC` includes the following harmonizationmethods:

-   ComBat (Johnson et al., 2007)
-   ComBat-GAM (Pomponio et al., 2020)
-   Longitudinal ComBat (Beer et al., 2020)
-   CovBat(Chen et al., 2021)


# Set up


Install the `ComBatFamQC` package and read in the data set to be harmonized. ADNI data is used in the vignette for illustration. To be noticed, the data set should include at least the following columns:

-   **batch** column
-   **feature** columns (make sure univariate column is excluded)
-   **covariate** columns (essential for **gam** model)
-   **random effect** column (essential for **lmer** model)

```{r setup, eval=FALSE}
library(ComBatFamQC)
library(dplyr)
data(adni)
```


# Interactive Batch Effect Diagnostics

Three types of regression models can be considered for batch effect evaluation:

-   `lm`: Linear regression model, which assumes that the relationship between the variables is linear.
-   `gam`: Generalized additive model, which models the relationship between the dependent variable and certain independent variable as a smooth, non-linear function, typically using splines.
-   `lmer`: Linear mixed-effects model, which extends the linear regression model to account for both fixed effects and random effects. It is commonly used in longitudinal datasets.

(**Note**: For Windows users, make sure to set `cores = 1` in `visual_prep` function. The MDMR test can be time-consuming, especially in large datasets. Users have the option to disable the MDMR test by setting `mdmr = FALSE`.)

## Linear Regression Model

```{r, eval=FALSE}
features <- colnames(adni)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "manufac"
result_orig <- visual_prep(type = "lm", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = NULL, df = adni)
comfam_shiny(result_orig)
```

## Generalized additive Model

```{r, eval=FALSE}
result_gam <- visual_prep(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth_int_type = "linear", smooth = "AGE", df = adni)
comfam_shiny(result_gam)
```

## Linear Mixed-Effects Model

```{r, eval=FALSE}
result_lmer <- visual_prep(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni)
comfam_shiny(result_lmer)
```

# Export Batch Effect Diagnosis Result

There are two export options: 1) generate a Quarto report (requires Quarto to be installed), and 2) generate a combined Excel file.

-   Generate a Quarto report

```{r, eval=FALSE}
#library(quarto)
temp_dir <- tempfile()
dir.create(temp_dir)
diag_save(path = temp_dir, result = result_lmer, use_quarto = TRUE)
```

-   Generate a combined EXCEL file

```{r, eval=FALSE}
diag_save(path = temp_dir, result = result_lmer, use_quarto = FALSE)
```

# Harmonization Using default ComBatFamily Methods


There are two types of harmonization scenarios users can choose from:

-   **First-time Harmonization** (Can also do interactive harmonization through Rshiny)
-   **Out of Sample Harmonization**
    -   predict from existing ComBat model (works only for **original ComBat** and **ComBat-GAM**)
    -   harmonize new data toward existing reference data (works for all built-in ComBat harmonization methods)

## First-time Harmonization

Specify parameters carefully based on the harmonization method to be applied.

### Original ComBat
```{r, eval=FALSE}
features <- colnames(adni)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "manufac"
## Harmonize using evaluation results as the inputs
combat_model <- combat_harm(result = result_orig, type = "lm", interaction = interaction, smooth = NULL, random = NULL, df = adni)
## Harmonize through specifying features, batch, covariates and df arguments
combat_model_copy <- combat_harm(type = "lm", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = NULL, df = adni)
## Expect to get the same harmonization results
identical(combat_model$harmonized_df, combat_model_copy$harmonized_df)

# save harmonized data
write.csv(combat_model$harmonized_df, file.path(temp_dir, "harmonized.csv"))

# save combat model
saveRDS(combat_model$combat.object, file.path(temp_dir, "combat_model.rds"))
# Clean up the temporary file
unlink(temp_dir, recursive = TRUE)
```

### Longitudinal ComBat
```{r, eval=FALSE}
## Harmonize using evaluation results as the inputs
combat_model_lmer <- combat_harm(result = result_lmer, type = "lmer", interaction = interaction, smooth = NULL, random = "subid", df = adni)
## Harmonize through specifying features, batch, covariates and df arguments
combat_model_lmer_copy <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni)
## Expect to get the same harmonization results
identical(combat_model_lmer$harmonized_df, combat_model_lmer_copy$harmonized_df)
```

### ComBat-GAM
```{r, eval=FALSE}
## Harmonize using evaluation results as the inputs
combat_model_gam <- combat_harm(result = result_gam, type = "gam", interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni)
## Harmonize through specifying features, batch, covariates and df arguments
combat_model_gam_copy <- combat_harm(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni)
## Expect to get the same harmonization results
identical(combat_model_gam$harmonized_df, combat_model_gam_copy$harmonized_df)
```

### CovBat
```{r, eval=FALSE}
## Harmonize using evaluation results as the inputs
covbat_model <- combat_harm(result = result_gam, type = "gam", interaction = interaction, smooth = "AGE", smooth_int_type = "linear", df = adni, family = "covfam")
## Harmonize through specifying features, batch, covariates and df arguments
covbat_model_copy <- combat_harm(type = "gam", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth_int_type = "linear", smooth = "AGE", df = adni, family = "covfam")
## Expect to get the same harmonization results
identical(covbat_model$harmonized_df, covbat_model_copy$harmonized_df)
```

## Out of Sample Harmonization

### from ComBat Model

Specify `predict` parameter to be TRUE and `object` parameter to be saved ComBat model.

```{r, eval=FALSE}
saved_model <- combat_model_gam$combat.object
harm_predict <- combat_harm(df = adni %>% head(1000), predict = TRUE, object = saved_model)
```

### from Reference Data

Specify `reference` parameter to be saved reference data. To be noticed, the reference data should have identical columns as the new data and the new data should contain reference data as its sub sample.

```{r, eval=FALSE}
# harmonize reference data
reference_site <- adni %>% group_by(site) %>% summarize(count = n()) %>% arrange(desc(count)) %>% pull(site) %>% head(30)
reference_df <- adni %>% filter(site %in% reference_site)
features <- colnames(reference_df)[c(43:104)]
covariates <- c("timedays", "AGE", "SEX", "DIAGNOSIS")
interaction <- c("timedays,DIAGNOSIS")
batch <- "site"
ref_model <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = reference_df)

# harmonize new data to the reference data
harm_new <- combat_harm(type = "lmer", features = features, batch = batch, covariates = covariates, interaction = interaction, smooth = NULL, random = "subid", df = adni, reference = ref_model$harmonized_df)
```