---
title: "Introduction to the UKB.COVID19 Package"
author: "Longfei Wang"
date: '`r Sys.Date()`'
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Introduction to the UKB.COVID19 Package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

```{r setup}
library(here)
library(tidyverse)
library(questionr)
library(data.table)
```

# Introduction

`UKB.COVID19` is an R package designed to process and analyze COVID-19 data from the UK Biobank (UKBB). It provides tools to summarize COVID-19 test results, perform association tests between COVID-19 outcomes and potential risk factors, and generate input files for genome-wide association studies (GWAS).

# Installation

To install the `UKB.COVID19` package, you can use the following commands:

```{r}
# Install from CRAN
# install.packages("UKB.COVID19")

# Load the package
library(UKB.COVID19)
```

# Data Preparation

Before using the package, ensure you have access to the UKBB COVID-19 data. You will need to download the relevant datasets and have them ready for analysis. Due to the restriction of using UKBB data, we illustrate the use cases using simulated data, which are located in the package UKB.COVID19/inst/extdata/ and can be retrieved with function `covid_example`.

```{r}
ukb.tab.file <- covid_example("sim_ukb.tab.gz")
ukb.tab <- fread((ukb.tab.file))
head(ukb.tab)
```

# Main Functions

## 1. Generating a Covariate Table with COVID-19 Risk Factors

The `risk_factor` function generates a covariate table with risk factors using UKBB main tab data. Automatically returns sex, age at birthday in 2020, SES, self-reported ethnicity, most recently reported BMI, most recently reported pack-years, whether they reside in aged care (based on hospital admissions data, and covid test data) and blood type. Function also allows user to specify fields of interest (field codes, provided by UK Biobank), and allows the users to specify more intuitive names, for selected fields.

Note: the ukb.tab file must include fields: f.eid, f.31.0.0, f.34.0.0, f.189.0.0, f.21001., f.21000., f.20161.

```{r}
covar <- risk_factor(ukb.data=covid_example("sim_ukb.tab.gz"),
                     ABO.data=covid_example("sim_covid19_misc.txt.gz"),
                     hesin.file=covid_example("sim_hesin.txt.gz"),
                     res.eng=covid_example("sim_result_england.txt.gz"))

head(covar)
```

## 2. Summarizing COVID-19 Test Results

The `makePhenotypes` function summarizes COVID-19 test results, death register data, and hospital inpatient data. It generates phenotypes for COVID-19 susceptibility, severity, and mortality.

### COVID-19 susceptibility

```{r}
susceptibility <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"),
                        res.eng=covid_example("sim_result_england.txt.gz"),
                        death.file=covid_example("sim_death.txt.gz"),
                        death.cause.file=covid_example("sim_death_cause.txt.gz"),
                        hesin.file=covid_example("sim_hesin.txt.gz"),
                        hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"),
                        hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"),
                        hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"),
                        code.file=covid_example("coding240.txt.gz"),
                        pheno.type = "susceptibility")
head (susceptibility)
```

### COVID-19 severity

```{r}
severity <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"),
                        res.eng=covid_example("sim_result_england.txt.gz"),
                        death.file=covid_example("sim_death.txt.gz"),
                        death.cause.file=covid_example("sim_death_cause.txt.gz"),
                        hesin.file=covid_example("sim_hesin.txt.gz"),
                        hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"),
                        hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"),
                        hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"),
                        code.file=covid_example("coding240.txt.gz"),
                        pheno.type = "severity")
```

### COVID-19 mortality

```{r}
mortality <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"),
                        res.eng=covid_example("sim_result_england.txt.gz"),
                        death.file=covid_example("sim_death.txt.gz"),
                        death.cause.file=covid_example("sim_death_cause.txt.gz"),
                        hesin.file=covid_example("sim_hesin.txt.gz"),
                        hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"),
                        hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"),
                        hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"),
                        code.file=covid_example("coding240.txt.gz"),
                        pheno.type = "mortality")
```

## 3. Perfroming Association Tests

The `log_cov` function performs association tests using logistic regressions. This is an example of association tests between COVID-19 susceptibility and three risk factors: sex, age and BMI.

```{r}
log_cov(pheno=susceptibility, covariates=covar, phe.name="pos.neg", cov.name=c("sex", "age", "bmi"))
```

## 4. Generating a Comorbidity Summary File

The `comorbidity.summary` function scans all the hospitalisation records with a given time period and generates comorbidity summary table. The following example is to generate a comorbidity summary table that includes all the primary and secondary diagnoses in the hospital inpatient data after 16 March 2020.

```{r}
comorb <- comorbidity_summary(ukb.data=covid_example("sim_ukb.tab.gz"),
                              hesin.file=covid_example("sim_hesin.txt.gz"), 
                              hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"), 
                              ICD10.file=covid_example("ICD10.coding19.txt.gz"),
                              primary = FALSE,
                              Date.start = "16/03/2020")
comorb[1:6,1:10]
```

## 5. Performing Association Tests between COVID-19 Phenotype and Comorbidities

The `comorbidity.asso` function performs association tests between comorbidity categories and selected phenotype using logistic regression models.This is an example of association tests between COVID-19 susceptibility and all comorbidities. It shows NAs when fitted probabilities numerically 0 or 1 occurred in the logistic regression models.

```{r}
comorb.asso <- comorbidity_asso(pheno=susceptibility,
                                covariates=covar,
                                cormorbidity=comorb,
                                population="white",
                                cov.name=c("sex","age","bmi","SES","smoke","inAgedCare"),
                                phe.name="pos.neg",
                                ICD10.file=covid_example("ICD10.coding19.txt.gz"))
head(comorb.asso, 4)
```

# Example Workflow

Here is an example workflow of analysing the risk factors of COVID-19 susceptibility, that combines the main functions of the UKB.COVID19 package:

```{r}
# Load the package
library(UKB.COVID19)

# Summarize COVID-19 risk factors
covar <- risk_factor(ukb.data=covid_example("sim_ukb.tab.gz"), 
                         ABO.data=covid_example("sim_covid19_misc.txt.gz"),
                         hesin.file=covid_example("sim_hesin.txt.gz"),
                         res.eng=covid_example("sim_result_england.txt.gz"))

# Summarize COVID-19 test results
susceptibility <- makePhenotypes(ukb.data=covid_example("sim_ukb.tab.gz"),
                        res.eng=covid_example("sim_result_england.txt.gz"),
                        death.file=covid_example("sim_death.txt.gz"),
                        death.cause.file=covid_example("sim_death_cause.txt.gz"),
                        hesin.file=covid_example("sim_hesin.txt.gz"),
                        hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"),
                        hesin_oper.file=covid_example("sim_hesin_oper.txt.gz"),
                        hesin_critical.file=covid_example("sim_hesin_critical.txt.gz"),
                        code.file=covid_example("coding240.txt.gz"),
                        pheno.type = "susceptibility")

# Perfrom association tests
log_cov(pheno=susceptibility, covariates=covar, phe.name="pos.neg", cov.name=c("sex", "age", "bmi"))

# Generate comorbidity table
comorb <- comorbidity_summary(ukb.data=covid_example("sim_ukb.tab.gz"),
                              hesin.file=covid_example("sim_hesin.txt.gz"), 
                              hesin_diag.file=covid_example("sim_hesin_diag.txt.gz"), 
                              ICD10.file=covid_example("ICD10.coding19.txt.gz"),
                              primary = FALSE,
                              Date.start = "16/03/2020")

# Perform association tests between COVID-19 phenotype and comorbidities
comorb.asso <- comorbidity_asso(pheno=susceptibility,
                                covariates=covar,
                                cormorbidity=comorb,
                                population="white",
                                cov.name=c("sex","age","bmi","SES","smoke","inAgedCare"),
                                phe.name="pos.neg",
                                ICD10.file=covid_example("ICD10.coding19.txt.gz"))

```

# Conclusion

The UKB.COVID19 package provides comprehensive tools for analyzing COVID-19 data from the UK Biobank. By following this vignette, users can efficiently summarize data, perform association tests, and prepare files for genetic analysis. For more detailed information, refer to the package documentation and the GitHub repository.