---
title: "Bridging across NGS-based Olink^®^ products"
output: 
  rmarkdown::html_vignette:
    toc: true
    toc_depth: 3
    fig_caption: TRUE
    includes:
      in_header: ../man/figures/logo.html
vignette: >
  %\VignetteIndexEntry{Bridging across NGS-based Olink products}
  %\VignetteEncoding{UTF-8}
  %\VignetteEngine{knitr::rmarkdown}
date: 'Compiled: `r format(Sys.Date(), "%B %d, %Y")`'
editor_options: 
  markdown: 
    wrap: 72
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  tidy = FALSE,
  tidy.opts = list(width.cutoff = 95),
  fig.width = 6,
  fig.height = 3,
  message = FALSE,
  warning = FALSE,
  time_it = TRUE,
  fig.align = "center"
)
```

## Introduction

Individual Olink^®^ NPX^TM^ projects are generally normalized using either plate
control normalization or intensity normalization methods. Since NPX is a
relative measurement, in the case when a study is separated into multiple
projects, an additional normalization step is needed to allow the data to be
comparable across projects. The following tutorial is designed to give you an
overview of the Olink bridging procedure for combining data sets from Olink^®^
Explore 3072, Olink^®^ Explore HT, and Olink^®^ Reveal products.

### Important Terminology

-   **Bridging samples** – Overlapping samples run on two or more projects
    that are used as references to enable normalization. These
    samples are selected as described in the 
    [Introduction to Bridging tutorial](bridging_introduction.html) to ensure 
    samples are of high quality and span the range of the data. In the case of
    data containing LOD, samples are also filtered for high detectability. 
    Bridging samples are selected from the project that is run first to run with
    the second project.

-   **Project** – A set of plates that run at the same time and have been
    normalized together. If two projects are not randomized or are run at
    different times then additional normalization is required.

-   **Project effect/correction** – As NPX is a relative quantification,
    overall NPX values may shift across projects. This can result in
    separation of projects that are part of the same study which can be
    corrected for using normalization or accounted for within a
    statistical model.
    
-   **Within-product bridging** – Normalization of two or more projects
    run on the same Olink product using bridging samples.

-   **Between-product bridging** - Normalization of two or more projects
    from different Olink products (in this case, Olink Explore 3072 and
    Olink Explore HT) using bridging samples.

-   **Reference data** – The project data which is being normalized to
    is known as the reference data. In the case of between-product
    bridging, the reference project is the Explore HT NPX data or Reveal NPX
    data. The reference data set is not altered during bridging and the other
    data set is adjusted to the reference data set using the bridging samples.

### Within- and between-product bridging

The joint analysis of two or more NPX projects run on the same Olink
product often requires a project correction step to remove
technical variation. One such method of normalizing two projects is referred to
as bridge sample reference normalization, bridge normalization, or simply
bridging. For more information on within-product bridging, see the 
[Introduction to Bridging tutorial](bridging_introduction.html). Bridging makes
certain assumptions on the distributions of the assays, namely that we are
measuring the same true biological range no matter the setting. If an assay
displays different distributions between projects, then both bridging and
downstream statistical analysis will be affected. Within a product, we assume
the variance and shape of the distribution remains constant within assays.

In the case where a study consists of separate projects run on Olink
Explore 3072 and either Olink Explore HT or Olink Reveal, an additional project
correction step is required to allow data from these two products to be analyzed
together, which is referred to as between-product bridging. Olink Explore 3072,
Olink Explore HT, and Olink Reveal are all products that use PEA technology
combined with next generation sequencing (NGS) to calculate NPX for thousands of
proteins. However, assays may vary more between products than within a product,
and fewer assumptions can be made regarding the similarity of assay
distributions and variance between products.

Since many of the assays profiled in Olink Explore 3072 are also found
on Olink Explore HT or Olink Reveal, bridging data across products enables
increased power in studies consisting of data from multiple Olink products,
rather than limiting these studies to meta-analysis. However, differences
between products, such as the number of assays being measured and the reagents
being used, can sometimes lead to signal in one product and noise in another
product. Bridging signal to noise can have detrimental effects on downstream
statistical analysis. This means that while some assays will be able to be
bridged using the same method as in within-product bridging, others will require
a different normalization method, and some will not be bridgeable at all. This
normalization strategy combines median-centering (as is used in within-product
bridging) and quantile smoothing to normalize assays across products based on
the assumption that assays can be bridged provided they have signal in both
products or noise in both products.

### Considerations for between-product bridging

For product bridging between an Olink Explore 3072 project and an Olink Explore
HT project or an Olink Reveal project, the NPX values from the Olink Explore
3072 project can be normalized and made comparable to those from Olink Explore
HT or Olink Reveal. This process is one-directional, and normalizing Olink
Explore HT or Olink Reveal NPX values to Olink Explore 3072 is not supported.
For product bridging between an Olink Explore HT project and an Olink Reveal
project, normalization is supported in both directions.

The product bridging normalization uses the assays that are overlapping between
the two products. ~2900 assays overlap between Olink Explore HT and Olink
Explore 3072, ~850 assays overlap between Olink Reveal and Olink Explore 3072,
~1000 assays overlap between Olink Explore HT and Olink Reveal.

Each overlapping assay undergoes a series of checks that evaluate the number of
counts, correlation, and difference of NPX ranges between the two data sets. If
an assay has enough counts and comparable metrics between the two data sets, it
is determined to be suitable for bridging (referred to as a "bridgeable assay").
Assays that are not suitable for bridging can either be excluded from downstream
analysis in one or both products or results can be integrated across products
using meta-analysis. The set of bridgeable assays across products will vary from
data set to data set, based on the samples present within the studies. Depending
on the NPX distribution of each bridgeable assay in the two data sets, the assay
is normalized using either median normalization or quantile smoothing.

Bridging an Explore 3072 data set to an Explore HT NPX data set requires 40-64
bridging samples, bridging an Olink Explore 3072 data set to an Olink Reveal
data set requires 32-48 bridging samples, while the bridging between Olink
Explore HT data set and Olink Reveal data set requires 24-40 bridging samples.
Bridging samples are shared samples among data sets and, as such, are analyzed
in both data sets. Olink NPX data sets without shared samples cannot be combined
using the bridging approach described below. More information on bridge sample
selection can be found in the section [selecting bridging samples](bridging_introduction.html#selecting-bridging-samples) of the 
Introduction to Bridging tutorial.

## Bridge Sample Selection

Prior to running a study with Explore HT or Olink Reveal, bridging samples must
be selected from the study run with Explore 3072 and be run on the subsequent
study. These samples can be selected using the `olink_bridgeselector()` function
in Olink Analyze as detailed in the section [selecting bridging samples](bridging_introduction.html#selecting-bridging-samples) of the 
Introduction to Bridging tutorial.

In addition, for studies involving Explore HT and Reveal, bridging can
also be performed directly between Explore HT and Reveal in either direction. In
these cases, bridging samples should be selected from the run used as the
reference and included in the corresponding subsequent run. The recommended
number of bridge samples for within- and between- product bridging is summarized
in the table below. When selecting bridge samples, the aim is to select samples
that represent the dynamic range of the assay expression in the product. As
such, quality control of the sample and, if available, proportion of data above
LOD in the sample are considered when determining if a sample is chosen as a
bridging sample. When LOD data is not available in the data export from Olink
NPX software, LOD can optionally be calculated from fixed LOD or negative
controls as detailed in the
[Calculating LOD from Olink Explore data tutorial](lod.html).

```{r brnrtab, eval = TRUE, message = FALSE, echo = FALSE}
dplyr::tibble(
  "Olink Product" = c(
    "'Olink Explore 3072' to 'Olink Explore HT'",
    "'Olink Explore 3072' to 'Olink Reveal'",
    "'Olink Explore HT' to 'Olink Reveal'",
    "'Olink Reveal' to 'Olink Explore HT'"
  ),
  "Number of Bridge Samples" = c(
    "40-64",
    "32-48",
    "24-40",
    "24-40"
  )
) |>
  kableExtra::kbl(
    booktabs = TRUE,
    digits = 2L,
    caption = paste("Recommended number of bridge samples for normalizing",
                    "between Olink products.")
  ) |>
  kableExtra::kable_styling(
    bootstrap_options = "striped",
    full_width = FALSE,
    position = "center",
    latex_options = "HOLD_position"
  )
```

## Workflow Overview

Olink Explore 3072 to Olink Explore HT bridging requires Explore 3072 data and
Explore HT data which have at least 40 to 64 bridge samples. Olink Explore
3072 to Olink Reveal bridging requires Explore 3072 data and Reveal data which
have at least 32 to 48 bridge samples. Bridging between Olink Explore HT and
Olink Reveal, requires Explore HT data and Reveal data which have at least 24 to
40 bridge samples.

For studies that include multiple Explore 3072 data sets, these data sets should
first be bridged using a within-product approach, as described in the 
[Introduction to bridging tutorial](bridging_introduction.html). The same 
principle applies when bridging data between Explore HT and Olink Reveal. If 
multiple HT or Reveal studies are available, it is recommended to first perform within-product bridging to combine all HT studies or all Reveal studies, 
respectively, before proceeding with between-product bridging.

The assays from Explore 3072 are matched to their corresponding assays in
Explore HT or Reveal and evaluated to determine whether each assay is
bridgeable. For studies involving Explore HT and Olink Reveal, the assays
between these two products are also matched and assessed for bridgeability in
the same manner. All assays are then normalized using both quantile smoothing
and normalization based on the median of paired differences. The output is an
adjusted data set that includes five additional columns, three of which relate
specifically to bridging normalization:

-   `BridgingRecommendation`: a flag which indicates if the assay is bridgeable
    and, if so, which normalization method is recommended. One of the three
    values will be listed in this column: "NotBridgeable", "MedianCentering", or
    "QuantileSmoothing".

-   `MedianCenteredNPX`: NPX values after normalization using the median of
    paired differences.

-   `QSNormalizedNPX`: NPX values after normalization using quantile smoothing.

After bridging, data from Explore 3072 and the reference product (Explore HT or 
Reveal) are exported into a single data set. For studies involving both Explore
HT and Olink Reveal, data are exported in the same way. Two additional columns
are added to facilitate data mapping and export.

-   `Project`: the name of the project as defined in the function input
arguments `df1_project_nr` and `df2_project_nr`.

-   `OlinkID_PRODUCT`: mapped Olink IDs from non-reference Olink product. The
term "PRODUCT" in the column name will be replaced by the non-reference name of
the product: "E3072", "HT" or "Reveal" (e.g. `OlinkID_E3072`). Olink IDs from
the reference product will be listed in the `OlinkID` column.
    
Note that regardless of the bridging recommendation, NPX values will be
available for both normalization methods. A visual representation of the
between-product bridging workflow is shown below.

```{r, fig.cap = fcap, eval = TRUE, echo = FALSE, out.width = "50%"}
knitr::include_graphics(
  normalizePath(
    path = "../man/figures/Bridging_schematic.png"
  ),
  error = FALSE
)

fcap <- "Schematic of Between-Product Bridging Workflow"
```

## Import NPX files

To normalize Explore 3072 data to Explore HT or Olink Reveal data, the data sets
should first be read into R using `read_NPX()`. If more than two datasets are
being normalized, all Explore 3072 studies should first be normalized through within-product bridging. The resulting data set should then be used as input for between-product bridging. In the case of multiple Explore HT studies or multiple
Reveal studies, only one study should be chosen as the reference data set. 

The same principle applies to studies involving Explore HT and Olink Reveal. 
When performing bridging between these products, data sets should be read using `read_NPX()`. If multiple HT or Reveal studies are available, within-product 
bridging should be performed first, and the resulting data set should be used as
the input.

The data can be loaded using `read_NPX()` function with default Olink Software
NPX file as input, as shown below.

```{r message = FALSE, eval = FALSE, echo = TRUE}
### Use provided example dataset

# Explore 3072: CSV or parquet file
data_e3072 <- OlinkAnalyze::read_NPX(
  filename = "~/NPX_Explore3072_location.parquet"
  )

# Explore HT: parquet file
data_eht <- OlinkAnalyze::read_NPX(
  filename = "~/NPX_ExploreHT_location.parquet"
)

# Reveal: CSV or parquet file
data_reveal <- OlinkAnalyze::read_NPX(
  filename = "~/NPX_Reveal_location.parquet"
)
```

The imported data should be further processed using the `check_npx()` and
`clean_npx()` functions to ensure that the data is in the correct format for
bridging and to generate check logs to identify any potential issues with the
data.

```{r message = FALSE, eval = FALSE, echo = TRUE}
### NPX file preprocessing

# Generate check log
check_log_data_e3072 <- OlinkAnalyze::check_npx(
  df = data_e3072
)
check_log_data_eht <- OlinkAnalyze::check_npx(
  df = data_eht
)
check_log_data_reveal <- OlinkAnalyze::check_npx(
  df = data_reveal
)

# Clean NPX data
data_e3072_clean <- OlinkAnalyze::clean_npx(
  df = data_e3072,
  check_log = check_log_data_e3072,
  # keep internal and external controls
  remove_control_sample = FALSE,
  remove_control_assay = FALSE,
  # keep datapoints with samples and assays warnings
  remove_qc_warning = FALSE,
  remove_assay_warning = FALSE
)
data_eht_clean <- OlinkAnalyze::clean_npx(
  df = data_eht,
  check_log = check_log_data_eht,
  # keep internal and external controls
  remove_control_sample = FALSE,
  remove_control_assay = FALSE,
  # keep datapoints with samples and assays warnings
  remove_qc_warning = FALSE,
  remove_assay_warning = FALSE
)
data_reveal_clean <- OlinkAnalyze::clean_npx(
  df = data_reveal,
  check_log = check_log_data_reveal,
  # keep internal and external controls
  remove_control_sample = FALSE,
  remove_control_assay = FALSE,
  # keep datapoints with samples and assays warnings
  remove_qc_warning = FALSE,
  remove_assay_warning = FALSE
)

# Generate check log on cleaned data
check_log_data_e3072_clean <- OlinkAnalyze::check_npx(
  df = data_e3072_clean
)
check_log_data_eht_clean <- OlinkAnalyze::check_npx(
  df = data_eht_clean
)
check_log_data_reveal_clean <- OlinkAnalyze::check_npx(
  df = data_reveal_clean
)

# clean up environment
rm(
  data_e3072,
  data_eht,
  data_reveal,
  check_log_data_e3072,
  check_log_data_eht,
  check_log_data_reveal
)
```

## Checking input datasets and bridging samples

First, confirm that there are overlapping sample IDs within the study. Note that
external controls should not be included in the list of bridging samples, as
detailed in the section [selecting bridging samples](bridging_introduction.html#selecting-bridging-samples) of the 
Introduction to Bridging tutorial. External control samples often share the same
naming convention across data sets but may represent different samples due to
reagent batch differences. Appending the project name to the end of the control
samples can ensure unique Sample IDs. For the example below, Explore HT data is
used as the reference project; however, the same process can be performed using
Reveal as the reference data set. The equivalent workflow also applies when
performing bridging between Explore HT and Reveal, where either product may be
used as the reference.

```{r, eval = FALSE, echo = TRUE}
# Note that if `SampleType` is not is input data:
# stringr::str_detect can be used to exclude control samples based on SampleID.

data_e3072_samples <- data_e3072_clean |>
  dplyr::filter(
    .data[["SampleType"]] == "SAMPLE"
  ) |>
  dplyr::distinct(
    .data[["SampleID"]]
  ) |>
  dplyr::pull()

data_eht_samples <- data_eht_clean |>
  dplyr::filter(
    .data[["SampleType"]] == "SAMPLE"
  ) |>
  dplyr::distinct(
    .data[["SampleID"]]
  ) |>
  dplyr::pull()

overlapping_samples <- dplyr::intersect(
  x = data_e3072_samples,
  y = data_eht_samples
) |>
  unique()
```

```{r, echo=FALSE}
try(
  readRDS(
    file = normalizePath(
      path = "../man/figures/overlapping_samples_table.rds"
    )
  ) |>
    kableExtra::kbl(
      booktabs = TRUE,
      digits = 2L,
      caption = "List of overlapping samples between the two projects."
    ) |>
    kableExtra::kable_styling(
      bootstrap_options = "striped",
      full_width = FALSE,
      position = "center",
      latex_options = "HOLD_position"
    )
)
```

PCA plots for each dataset can be used to assess if any bridge samples are
outliers in the dataset.

```{r, include = FALSE}
f3 <- paste0(
  "PCA plot prior to bridging for Explore 3072 data and data from the",
  " reference product. Bridge samples are indicated by color. PCA plots can be",
  " helpful in assessing if any bridge samples were outliers in one of the",
  " platforms."
)
```

```{r, eval = FALSE}
#### Extract bridging samples

data_e3072_before_br <- data_e3072_clean |>
  dplyr::filter(
    .data[["SampleType"]] == "SAMPLE"
  ) |>
  # Note that if column `SampleType` is not in input data, the function
  # stringr::str_detect can be used to exclude control samples based on naming
  # convention.
  dplyr::mutate(
    Type = dplyr::if_else(
      .data[["SampleID"]] %in% .env[["overlapping_samples"]],
      paste0("Explore 3072 Bridge"),
      paste0("Explore 3072 Sample")
    )
  )

data_eht_before_br <- data_eht_clean |>
  dplyr::filter(
    .data[["SampleType"]] == "SAMPLE"
  ) |>
  # Note that if column `SampleType` is not in input data, the function
  # stringr::str_detect can be used to exclude control samples based on naming
  # convention.
  dplyr::mutate(
    Type = dplyr::if_else(
      .data[["SampleID"]] %in% .env[["overlapping_samples"]],
      paste0("Explore HT Bridge"),
      paste0("Explore HT Sample")
    )
  )

### PCA plot
pca_e3072 <- OlinkAnalyze::olink_pca_plot(
  df = data_e3072_before_br,
  check_log = check_log_data_e3072_clean,
  color_g = "Type",
  quiet = TRUE
)
pca_eht <- OlinkAnalyze::olink_pca_plot(
  df = data_eht_before_br,
  check_log = check_log_data_eht_clean,
  color_g = "Type",
  quiet = TRUE
)
```

```{r, echo = FALSE, fig.cap = f3, fig.height = 8, fig.width = 6}
knitr::include_graphics(
  normalizePath(
    path = "../man/figures/PCA_btw_product_before.png"
  ),
  error = FALSE
)
```

## Normalization

The `olink_normalization()` functionality has been expanded and can be used to
determine which assays are bridgeable and of the bridgeable assays what
normalization method is advised, and to calculate normalized NPX values for the
non-reference project. Normalized NPX values are calculated for all assays 
across products as described in the [Workflow Overview] and in the sections
below. Within this function, the bridging recommendations for each assay are
determined and the NPX values are normalized using the two methods described
below.

The `olink_normalization()` function contains a `format` argument that is set to 
`FALSE` by default. This will export the data frame with the format shown in
Table 4 of the [Function Output] section. The values in the NPX column will
remain unchanged and median-centered NPX values and QS-normalized NPX values
will be populated in the `MedianCenteredNPX` and `QSNormalizedNPX` columns for
all datapoints, regardless of bridging recommendation. 

If the format argument is set to `TRUE`, this will export the data frame with
the NPX values replaced with the bridged NPX values corresponding to the
bridging recommendation (see Table 5 of the [Function Output] section). For more
information, see the [Downstream Analysis] section below. 

```{r, eval = FALSE, echo = TRUE}
### Perform bridge normalization

# Note:
# Project name is assigned by `df1_project_nr` and `df2_project_nr` parameters
# in `olink_normalization` function

# Perform between-product bridging without formatting for downstream analysis
npx_br_data <- OlinkAnalyze::olink_normalization(
  df1 = data_eht_clean,
  df2 = data_e3072_clean,
  overlapping_samples_df1 = overlapping_samples,
  df1_project_nr = "Explore HT",
  df2_project_nr = "Explore 3072",
  reference_project = "Explore HT",
  format = FALSE,
  df1_check_log = check_log_data_eht_clean,
  df2_check_log = check_log_data_e3072_clean
)

# Perform between-product bridging with formatting for downstream analysis
npx_br_data <- OlinkAnalyze::olink_normalization(
  df1 = data_eht_clean,
  df2 = data_e3072_clean,
  overlapping_samples_df1 = overlapping_samples,
  df1_project_nr = "Explore HT",
  df2_project_nr = "Explore 3072",
  reference_project = "Explore HT",
  format = TRUE,
  df1_check_log = check_log_data_eht_clean,
  df2_check_log = check_log_data_e3072_clean
)
```

Use `check_npx()` and `clean_npx()` to ensure that the data is in the correct
format and to clean the data after bridging.

```{r message = FALSE, eval = FALSE, echo = TRUE}
# Generate check log
check_log_br_data <- OlinkAnalyze::check_npx(
  df = npx_br_data
)

# Clean NPX data
npx_br_data_clean <- OlinkAnalyze::clean_npx(
  df = npx_br_data,
  check_log = check_log_br_data,
  # keep only control samples as we will need them for downstream QC
  remove_control_sample = FALSE
)

# Generate check log on cleaned data
check_log_br_data_clean <- OlinkAnalyze::check_npx(
  df = npx_br_data_clean
)

# clean up environment
rm(
  npx_br_data,
  check_log_br_data
)
```

### Determining bridging recommendations

For an assay to be bridgeable across products, it must either have signal in
both products or be primarily background signal in both products. Bridging noise
into signal or signal into noise can negatively impact downstream statistical
analysis. To determine if an assay is bridgeable, the bridge samples from both
products are used to assess the following criteria:

-   Is there a linear relationship between products?
    -   **Assessing linearity across products:** To determine if there
        is a linear relationship between products for an assay, the
        linear coefficient of determination (R^2^) is calculated using
        Pearson correlation. R^2^ is a measure of how much of the
        variation in the data is explained by the linear function
        compared to just using the mean. In this correlation, counts
        below 10 are excluded due to lack of signal. The R^2^ value is
        calculated and an assay is considered to have a linear
        relationship across products if the R^2^ value is above the
        cutoff. A higher R^2^ value indicates, that for both products,
        the assay is in the linear range. Conversely, a low R^2^ means
        that either one or both assays are in background. The default
        cutoff is set to R^2^ \> 0.8 indicating that at least 80% of the
        variation in the data is explained by the linear function.
-   Are the NPX ranges in the two products similar?
    -   **Assessing similarity of NPX ranges:** To determine if the NPX
        ranges are similar across products, the difference in NPX values
        from the 10% to 90% quantile is calculated for each product,
        excluding data points with counts less than 10. If the
        difference in range of NPX between products is greater than the
        cutoff then the ranges are not considered similar across
        products. Since the NPX values are calculated on the same
        samples, it is expected that an increase in 1 NPX in one product
        would correspond to an increase of 1 NPX in the other product.
        If the ranges are not similar, this suggests that 1 NPX is not
        equivalent across products. By default, the cutoff is set to a
        difference of less than 1 NPX between products.
-   Are there sufficient counts in both products?
    -   **Assessing if there are sufficient counts:** An assay's
        absolute level of counts is important to consider as the
        instruments used to generate NPX values have an inherent noise
        level. To determine if there are sufficient counts in an assay
        for bridging, the median number of counts in both products is
        calculated, excluding data points with less than 10 counts. If
        the median number of counts is less than the cutoff then the
        assay does not have sufficient counts to be used for bridging.
        The default cutoff is set to 150 counts, which is based on the
        count quality control metrics for Explore products.

For assays that are bridgeable, the shape of the NPX distribution is
compared between the two products:

-   **Assessing similarity of NPX distribution across products:** If any of the
    three criteria outlined above are met then the assay is considered
    bridgeable. Otherwise, bridging is not recommended for that assay.
    If an assay is bridgeable, the similarity of the NPX distribution is
    used to determine which method is recommended for bridging. The
    Kolmogorov-Smirnov test, or KS test, is used to assess the
    similarity of two distributions by calculating the KS statistic,
    which is based on the empirical cumulative distribution function
    (ECDF). Counts below 10 are excluded and the largest difference seen
    in the ECDF becomes the KS statistic. If the KS statistic is above
    the cutoff, the distributions are considered to have different
    shapes. In this case, a median shift is not sufficient to normalize
    the data, and quantile smoothing is recommended. If the distance is
    less than the cutoff, then normalization using the median of paired
    differences is recommended. By default this cutoff difference is set
    to 0.2.

An overview of these criteria is visualized below.

```{r, echo = FALSE, fig.cap = fcap, out.width = "50%"}
knitr::include_graphics(
  normalizePath(
    path = "../man/figures/assay_bridgeability.jpg"
  ),
  error = FALSE
)
fcap <- paste(
  "Criteria to determine the bridging recommendation for an assay. The",
  "assessment of linearity ensures bridging between signal in both platforms",
  "or noise in both platforms (but not between signal and noise). Similar NPX",
  "ranges and sufficient counts provide additional insight into an assay's",
  "bridgeability. Distribution shape is assessed to determine recommended",
  "bridging method."
)
```
\
\

The `olink_bridgeability_plot` function generates a series of figures on a
per-assay basis for a data set generated from between-product bridging, based
on the bridging samples used in the bridge normalization. The coloration of the
figure headers indicate whether that assay has been defined as bridgeable or not
bridgeable. Red headers indicate that an assay is not bridgeable and blue
headers indicate that an assay is bridgeable. The correlation plot, violin plot,
and bar chart figures illustrate the three criteria described above for
determining whether an assay is bridgeable. 

If an assay is determined to be bridgeable, the ECDF curve and corresponding KS
statistic are used to determine which normalization approach (median centering
or quantile smoothing) is most suitable for between-product normalization.

Prior to assessment, outlier bridging samples are excluded. A sample is
considered an outlier if the NPX value is more than 3 times the interquartile
range above or below the median on either product.

After assessment, an assay is considered bridgeable if it meets any of the first
three criteria. The fourth criterion determines which normalization method is
recommended for bridging. Note that bridgeable assays will differ between
projects based on the expression of bridge samples in the studies. Here, the
Explore 3072 to Explore HT bridging case is shown as the example, although the
same principles apply to other product combinations, including Olink Explore HT
and Olink Reveal.

```{r, eval = FALSE, echo = TRUE}
### Generate olink_bridgeability_plot figures

npx_br_data_bridgeable_plt <- npx_br_data_clean |>
  dplyr::filter(
    .data[["SampleType"]] == "SAMPLE"
  ) |>
  OlinkAnalyze::olink_bridgeability_plot(
    check_log = check_log_br_data_clean,
    # Important to note that setting `olink_id` to NULL will generate plots for
    # all assays. This can be computationally intensive if there are many
    # assays!
    # To generate plots for a subset of assays, set `olink_id` to a vector of
    # Olink IDs of interest.
    olink_id = NULL,
    median_counts_threshold = 150L,
    min_count = 10L
  )

npx_br_data_bridgeable_plt[[1L]]
```

```{r, message = FALSE, echo = FALSE, out.width = "675px", fig.cap = fcap}
knitr::include_graphics(
  normalizePath(path = "../man/figures/bridgeable_plt_MedianCenter.png"),
  error = FALSE
)

fcap <- paste("Visualization of an assay's bridgeability criteria as generated",
              "by the `olink_bridgeability_plot()` function.")
```
\
\

### Normalization using the median of paired differences

If it is expected that both the kind of distribution and the variance
per test between runs are the same, then normalization using the median
of paired differences will be preferred. Normalization using the median
of paired differences based on the bridging samples is performed in the
following steps:

1.  For each assay in the non-reference project (e.g. Explore 3072), the
pairwise difference is calculated for each of the bridging samples with the
Explore HT project.

2.  The normalization factor is estimated for each assay by finding the
median of the pairwise differences.

3.  The assay-specific normalization factor for each assay is used to normalize
each data point from the non-reference to the reference project.

### Quantile smoothing

Since Olink NGS products are distinct with different workflows involved in
generating NPX data, some assays exist in corresponding but distinct NPX spaces.
For those assays, the median of paired differences is insufficient for bridging
because it uses only a single anchor point (the median/50% quantile). Instead,
quantile smoothing (QS) using multiple anchor points (5%, 10%, 25%, 50%, 75%,
90%, and 95% quantiles) is preferred to map data from the non-reference set to 
the distribution of the reference set.

The normalization using QS uses bridging samples to perform the following steps:

1.  Each data point of the samples from the non-reference project is mapped to
the equivalent space in the reference product using an empirical cumulative
distribution function. The empirical cumulative distribution function is a
probability model that uses the observed NPX values of the bridging samples for
an assay to create a step function that interpolates linearly between available
data points.

2.  The empirical distribution function is then used to map the data points from
the source product to the reference product space using the specified quantiles.
At this stage, all data points from the bridging samples have NPX values
normalized to the reference product.

3.  To normalize the remaining data, a spline regression model is constructed
using the sorted data from the source product (prior to mapping) and the mapped
values, along with the anchor points of the spline. A spline regression model
divides the dataset at the quantiles and uses these quantiles as anchor points
or knots; the model then fits the values between each anchor point.

4.  The spline regression model is used to predict all remaining data points
from the non-reference product into the space of the reference product. The
model consists of piecewise linear regressions within the quantile intervals.
The NPX value from the source product is used as the x-value in the appropriate
interval to produce the predicted y-value corresponding to the NPX scale of the
reference product.

### Function Output

The output from `olink_normalization()` function when used for between 
product bridging is a data frame with concatenated data from the two products 
and additional columns including adjusted NPX values, bridging recommendations,
mapping information, and project names. The adjusted NPX values are notated in
the columns `MedianCenteredNPX` and `QSNormalizedNPX`. For each assay a
recommendation is listed in the `BridgingRecommendation` column and
lists what method, if any, should be used for that assay. Additional
columns including `OlinkID` and `OlinkID_PRODUCT` [Workflow Overview] map the
assays across products and the `Project` column lists the name of the project
based on the `df1_project_nr` and `df2_project_nr` arguments. As the reference
data is not altered during normalization, the normalized NPX values
(`MedianCenteredNPX` and `QSNormalizedNPX`) in the reference data will be the
same as the values in the NPX column which contains the non-normalized data.

```{r, eval = TRUE, echo = FALSE}
try(
  readRDS(
    file = normalizePath("../man/figures/bridging_results.rds")
  ) |>
    kableExtra::kbl(
      booktabs = TRUE,
      digits = 1,
      caption = paste("Table 4. First 5 rows of combined datasets after",
                      "bridging with between-product formatting argument set",
                      "to FALSE.")
    ) |>
    kableExtra::kable_styling(
      bootstrap_options = "striped",
      full_width = FALSE,
      font_size = 10,
      position = "center",
      latex_options = "HOLD_position"
    ) |>
    kableExtra::scroll_box(
      width = "100%"
    )
)
```

```{r, eval = TRUE, echo = FALSE}
try(
  readRDS(
    normalizePath(path = "../man/figures/bridging_results.rds")
  ) |>
    dplyr::mutate(
      NPX = dplyr::case_when(
        .data[["BridgingRecommendation"]] == "MedianCentering" ~
          .data[["MedianCenteredNPX"]],
        .data[["BridgingRecommendation"]] == "QuantileSmoothing" ~
          .data[["QSNormalizedNPX"]],
        .default = .data[["NPX"]]
      )
    ) |>
    dplyr::mutate(
      SampleID = paste(.data[["SampleID"]], .data[["Project"]], sep = "_")
    ) |>
    dplyr::mutate(
      OlinkID = dplyr::if_else(
        .data[["BridgingRecommendation"]] != "NotBridgeable",
        paste(.data[["OlinkID"]], .data[["OlinkID_E3072"]], sep = "_"),
        .data[["OlinkID_E3072"]]
      )
    ) |>
    dplyr::select(
      -dplyr::all_of(
        c("OlinkID_E3072", "MedianCenteredNPX", "QSNormalizedNPX")
      )
    ) |>
    kableExtra::kbl(
      booktabs = TRUE,
      digits = 1L,
      caption = paste("Table 5. First 5 rows of combined datasets after",
                      "bridging with between-product formatting argument set",
                      "to TRUE.")
    ) |>
    kableExtra::kable_styling(
      bootstrap_options = "striped",
      full_width = FALSE,
      font_size = 10,
      position = "center",
      latex_options = "HOLD_position"
    ) |>
    kableExtra::scroll_box(
      width = "100%"
    )
)
```

## Evaluating the quality of bridging

PCA is used to assess the quality of bridging by determining if the
sample controls (SCs) and bridging samples appear closer after bridging.
Two PCAs can be generated, one containing the SCs and one containing the
bridging samples. Prior to bridging there will be a noticeable
separation between products which should decrease after bridging.

```{r, include = FALSE}
f8 <- paste("Combined PCA of sample controls from both platforms prior to",
            "normalization.")
f9 <- paste("Combined PCA of bridging samples from both platforms prior to",
            "normalization.")
f10 <- paste("Combined PCA of sample controls from both platforms after",
             "normalization.")
f11 <- paste("Combined PCA of bridging samples from both platforms after",
             "normalization.")
```

```{r, eval = FALSE, echo = TRUE}
# Prepare data for PCA plots - pre-bridging

npx_pre_data <- data_eht_clean |>
  dplyr::mutate(
    Project = "Explore HT"
  ) |>
  dplyr::bind_rows(
    data_e3072_clean |>
      dplyr::mutate(
        Project = "Explore 3072"
      )
  )

check_log_pre_data <- OlinkAnalyze::check_npx(
  df = npx_pre_data
)

# no need to clean data set `npx_pre_data`
# ```
```

```{r pca_pre_sc, eval = FALSE, echo = TRUE}
# Generate pre-bridging PCA using Sample Control samples

npx_pre_data |>
  dplyr::filter(.data[["SampleType"]] == "SAMPLE_CONTROL") |>
  dplyr::mutate(
    SampleID = paste(.data[["Project"]], .data[["SampleID"]], sep = "_")
  ) |>
  OlinkAnalyze::olink_pca_plot(
    check_log = check_log_pre_data,
    color_g = "Project",
  )
```

```{r pca_pre_sc_fig, eval = TRUE, echo = FALSE, fig.cap = f8, message = FALSE}
# Generate pre-bridging PCA using Sample Control samples
knitr::include_graphics(
  path = normalizePath(
    path = "../man/figures/SCs_pre_bridging.png"
  ),
  error = FALSE
)
```

```{r pca_pre_bridge, eval = FALSE, echo = TRUE}
# Generate pre-bridging PCA using bridging sample

npx_pre_data |>
  dplyr::filter(
    .data[["SampleType"]] == "SAMPLE"
  ) |>
  dplyr::filter(
    .data[["SampleID"]] %in% .env[["overlapping_samples"]]
  ) |>
  dplyr::mutate(
    SampleID = paste(.data[["Project"]], .data[["SampleID"]], sep = "_")
  ) |>
  OlinkAnalyze::olink_pca_plot(
    check_log = check_log_pre_data,
    color_g = "Project"
  )
```

```{r, eval = TRUE, echo = FALSE, fig.cap = f9}
# Generate pre-bridging PCA using bridging sample

knitr::include_graphics(
  path = normalizePath(
    path = "../man/figures/bridges_pre_bridging.png"
  ),
  error = FALSE
)
```

```{r, eval = FALSE, echo = TRUE}
### Format post-bridging data

## Keep the data following BridgingRecommendation
npx_post_br_reco <- npx_br_data_clean |>
  # Not necessary if olink_normalization() is run with format = TRUE
  dplyr::filter(
    .data[["BridgingRecommendation"]] != "NotBridgeable"
  ) |>
  dplyr::mutate(
    NPX = dplyr::case_when(
      .data[["BridgingRecommendation"]] == "MedianCentering" ~
        .data[["MedianCenteredNPX"]],
      .data[["BridgingRecommendation"]] == "QuantileSmoothing" ~
        .data[["QSNormalizedNPX"]],
      .default = .data[["NPX"]]
    )
  )
```

```{r pca_post_SC, eval = FALSE, echo = TRUE}
# Generate PCA plot of post-bridging data from Sample Controls

npx_post_br_reco |>
  dplyr::filter(
    .data[["SampleType"]] == "SAMPLE_CONTROL"
  ) |>
  dplyr::mutate(
    SampleID = paste(.data[["Project"]], .data[["SampleID"]], sep = "_")
  ) |>
  OlinkAnalyze::olink_pca_plot(
    color_g = "Project",
    check_log = check_log_br_data_clean
  )
```

```{r, eval = TRUE, echo = FALSE, fig.cap = f10}
# Generate PCA plot of post-bridging data from Sample Controls

knitr::include_graphics(
  path = normalizePath(
    path = "../man/figures/SCs_post_bridging.png"
  ),
  error = FALSE
)
```

```{r, eval = FALSE, echo = TRUE}
# Generate PCA plot of post-bridging data from bridging samples

npx_post_br_reco |>
  dplyr::filter(
    .data[["SampleType"]] == "SAMPLE"
  ) |>
  dplyr::filter(
    .data[["SampleID"]] %in% .env[["overlapping_samples"]]
  ) |>
  dplyr::mutate(
    SampleID = paste0(.data[["Project"]], .data[["SampleID"]])
  ) |>
  OlinkAnalyze::olink_pca_plot(
    color_g = "Project",
    check_log = check_log_br_data_clean
  )
```

```{r, echo = FALSE, fig.cap = f11}
# Generate PCA plot of post-bridging data from bridging samples

knitr::include_graphics(
  path = normalizePath(
    path = "../man/figures/bridges_post_bridging.png"
  ),
  error = FALSE
)
```

## Exporting Normalized Data

The normalized non-reference data set (e.g. Explore 3072) can be exported using
`arrow::write_parquet()` to create a long format Olink NGS file.

```{r, eval = FALSE, echo = TRUE}
### Export normalized data

# Here we will export the full dataset including internal and external controls
# to follow Olink Software Export File formatting, but the data can be filtered
# to include only samples and assays of interest prior to export.
df <- npx_br_data |>
  dplyr::filter(
    .data[["Project"]] == "Explore_3072"
  ) |>
  arrow::as_arrow_table()

df$metadata$FileVersion <- "NA"
df$metadata$ExploreVersion <- "NA"
df$metadata$ProjectName <- "NA"
df$metadata$SampleMatrix <- "NA"
df$metadata$DataFileType <- "R Package Export File"
df$metadata$ProductType <- "Explore3072"
df$metadata$Product <- "Explore3072"

arrow::write_parquet(
  x = df,
  sink = "path_to_output.parquet"
)
```

## FAQs

### Overlapping Assays within products

Both the Explore 3072 and Explore HT products contain assays that appear
multiple times in the product, known as overlapping assays or correlation
assays. In Explore 3072, these present as overlapping assays across panels. In
Explore HT, these are overlapping assays across blocks. These assays are
included for QC purposes and allow users to evaluate data performance across
panels in Explore 3072 and across blocks in Explore HT. Within each product, the
assays contain unique OlinkID values for each of their corresponding panels and
blocks in Explore 3072 and Explore HT, respectively.

IL6, IL8 (CXCL8), and TNF are included in the Cardiometabolic, Oncology,
Neurology and Inflammation panels, while IDO1, LMOD1, and SCRIB are included in
the Cardiometabolic II, Oncology II, Neurology II and Inflammation II panels.
Each correlation assay is measured four times in an Olink Explore 3072 run. In
Explore HT, GBP1 and MAPK1 serve as overlapping assays and are measured three
times in a run.

### Downstream Analysis

Olink Analyze statistical analysis functions default to use the data in the
`NPX` column. To use the recommended normalized data, set the
`olink_normalization()` format argument to `TRUE` when performing bridge
normalization. The `NPX` column values will be replaced with the recommended
normalized values corresponding to the normalization approach identified in the
`BridgingRecommendation` column. Datapoints identified as `NotBridgeable` will
retain their original NPX values and OlinkIDs. Assays that are not overlapping
between products will be identified as "NotOverlapping" and will retain their
original NPX values and OlinkIDs. External controls will be removed from the
formatted dataframe. Sample IDs will be concatenated with their corresponding
project IDs to ensure that all samples are analyzed individually. Additionally,
to ensure that overlapping assays within products are analyzed individually,
OlinkID can be temporarily assigned to the concatenated version of the OlinkIDs
in the bridgeable assays. The `OlinkID_PRODUCT`, `MedianCenteredNPX`, and
`QSNormalizedNPX` columns will be removed. This dataframe can then be used in
any downstream analysis function within Olink Analyze. 

Alternatively, if the `olink_normalization()` function is run with the format
argument set to 'FALSE', then the `NPX` column will not be modified and the
non-normalized NPX data will be used by default. To use the recommended
normalized data, `dplyr::mutate()` can be used to reassign the NPX data.
Additionally, to ensure that overlapping assays within products are analyzed
individually, `OlinkID` can be temporarily assigned to the concatenated version
of the OlinkIDs. This dataframe can then be used in any downstream analysis
function within Olink Analyze. 

Assays which are not recommended for bridging should be analyzed separately and
can be combined using a meta-analysis. Depending on the study design these
assays can either be excluded from the downstream analysis or the assays can be
treated as non-overlapping assays.

```{r, eval = FALSE, echo = TRUE}
### npx_post_br_clean generated by olink_normalization with format = TRUE

## Option 1: Exclude non-bridgeable assays from both products
npx_recommended <- npx_br_data_clean |>
  dplyr::filter(
    .data[["BridgingRecommendation"]] != "NotBridgeable"
  )

## Option 2: Analyze non-bridgeable assays separately
# No further preprocessing needed
npx_recommended <- npx_br_data_clean
```

```{r, eval = FALSE, echo = TRUE}
### npx_post_br_clean generated by olink_normalization with format = FALSE

## Option 1: Exclude non-bridgeable assays from both products
npx_recommended <- npx_br_data_clean |>
  dplyr::mutate(
    NPX_original = .data[["NPX"]]
  ) |>
  dplyr::filter(
    .data[["BridgingRecommendation"]] != "Not Bridgeable"
  ) |>
  dplyr::mutate(NPX = dplyr::case_when(
    .data[["BridgingRecommendation"]] == "MedianCentering" ~
      .data[["MedianCenteredNPX"]],
    .data[["BridgingRecommendation"]] == "QuantileSmoothing" ~
      .data[["QSNormalizedNPX"]],
    .default = .data[["NPX"]]
  )
  ) |>
  dplyr::mutate(
    OlinkID_HT = .data[["OlinkID"]]
  ) |>
  dplyr::mutate(
    OlinkID = paste0(.data[["OlinkID"]], "_", .data[["OlinkID_E3072"]])
  )

# Option 2: Analyze non bridgeable assays separately
npx_recommended <- npx_br_data_clean |>
  dplyr::mutate(
    NPX_original = .data[["NPX"]]
  ) |>
  dplyr::mutate(
    NPX = dplyr::case_when(
      .data[["BridgingRecommendation"]] == "MedianCentering" ~
        .data[["MedianCenteredNPX"]],
      .data[["BridgingRecommendation"]] == "QuantileSmoothing" ~
        .data[["QSNormalizedNPX"]],
      .default = .data[["NPX"]]
    )
  ) |>
  dplyr::mutate(
    OlinkID_HT = .data[["OlinkID"]]
  ) |>
  dplyr::mutate(
    OlinkID = dplyr::if_else(
      .data[["BridgingRecommendation"]] != "NotBridgeable",
      paste0(.data[["OlinkID"]], "_", .data[["OlinkID_E3072"]]),
      # Concatenated OlinkID for bridgeable Assays
      dplyr::if_else(.data[["Project"]] == "Explore HT",
                     # replace with reference project name as set in function
                     .data[["OlinkID"]],
                     .data[["OlinkID_E3072"]]
      )
    )
  )
```

## Contact Us

We are always happy to help. Email us with any questions:

-   biostat\@olink.com for statistical services and general stats
    questions

-   support\@olink.com for Olink lab product and technical support

-   info\@olink.com for more information

## Legal Disclaimer

© `r format(Sys.Date(), "%Y")` Olink Proteomics AB, part of Thermo Fisher
Scientific.

Olink products and services are For Research Use Only. Not for use in diagnostic
procedures.

All information in this document is subject to change without notice. This
document is not intended to convey any warranties, representations and/or
recommendations of any kind, unless such warranties, representations and/or
recommendations are explicitly stated.

Olink assumes no liability arising from a prospective reader’s actions based on
this document.

OLINK, NPX, PEA, PROXIMITY EXTENSION, INSIGHT and the Olink logotype are
trademarks registered, or pending registration, by Olink Proteomics AB. All
third-party trademarks are the property of their respective owners.

Olink products and assay methods are covered by several patents and patent
applications [https://www.olink.com/patents/](https://olink.com/patents/).
