--- title: "Grouped Hyperframe" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{intro} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} library(knitr) opts_chunk$set( collapse = TRUE, comment = "#>" ) options(rmarkdown.html_vignette.check_title = FALSE) ``` # Introduction This vignette for package **`groupedHyperframe`** documents the creation of `groupedHyperframe` object, the batch processes defined for a `groupedHyperframe`, and aggregations over multi-level grouping structure. ## Prerequisite Package **`groupedHyperframe`** requires the development versions of **`spatstat`** family of packages. ```{r eval = FALSE} devtools::install_github('spatstat/spatstat'); packageDate('spatstat') devtools::install_github('spatstat/spatstat.data'); packageDate('spatstat.data') devtools::install_github('spatstat/spatstat.explore'); packageDate('spatstat.explore') devtools::install_github('spatstat/spatstat.geom'); packageDate('spatstat.geom') devtools::install_github('spatstat/spatstat.linnet'); packageDate('spatstat.linnet') devtools::install_github('spatstat/spatstat.model'); packageDate('spatstat.model') devtools::install_github('spatstat/spatstat.random'); packageDate('spatstat.random') devtools::install_github('spatstat/spatstat.sparse'); packageDate('spatstat.sparse') devtools::install_github('spatstat/spatstat.univar'); packageDate('spatstat.univar') devtools::install_github('spatstat/spatstat.utils'); packageDate('spatstat.utils') ``` ## Note to Users Examples in this vignette require that the `search` path has ```{r setup} library(groupedHyperframe) library(survival) # to help hyperframe understand Surv object ``` Users should remove parameter `mc.cores = 1L` from all examples and use the default option, which engages all CPU cores on the current host for macOS. The authors are forced to have `mc.cores = 1L` in this vignette in order to pass `CRAN`'s submission check. ## Additional Resources A development version of package **`groupedHyperframe`** is hosted on [Github](https://github.com/tingtingzhan/groupedHyperframe). ```{r eval = FALSE} devtools::install_github('tingtingzhan/groupedHyperframe', build_vignettes = TRUE) vignette('intro', package = 'groupedHyperframe') ``` ## List of Terms and Abbreviations ```{r echo = FALSE, results = 'asis'} c( '`attr`', 'Attributes', '`base::attr`; `base::attributes`', '`CRAN`, `R`', 'The Comprehensive R Archive Network', 'https://cran.r-project.org', '`data.frame`', 'Data frame', '`base::data.frame`', '`formula`', 'Formula', '`stats::formula`', '`fv`, `fv.object`', 'Function value table', '`spatstat.explore::fv.object`', '`groupedData`', 'Grouped data frame', '`nlme::groupedData`', '`hypercolumn`', 'Column of hyper data frame', '`spatstat.geom::hyperframe`', '`hyperframe`', 'Hyper data frame', '`spatstat.geom::hyperframe`', '`inherits`', 'Class inheritance', '`base::inherits`', '`kerndens`', 'Kernel density', '`stats::density.default()$y`', '`matrix`', 'Matrix', '`base::matrix`', '`mc.cores`', 'Number of CPU cores to use', '`parallel::mclapply`, `parallel::detectCores`', '`multitype`', 'Multitype object', '`spatstat.geom::is.multitype`', '`ppp`, `ppp.object`', '(Marked) point pattern', '`spatstat.geom::ppp.object`', '`~ g1/.../gm`', 'Nested grouping structure', '`nlme::groupedData`; `nlme::lme`', '`quantile`', 'Quantile', '`stats::quantile`', '`S3`', '`R`\'s simplest object oriented system', 'https://adv-r.hadley.nz/s3.html', '`search`', 'Search path', '`base::search`', '`Surv`', 'Survival object', '`survival::Surv`', '`trapz`, `cumtrapz`', '(Cumulative) trapezoidal integration', '`pracma::trapz`; `pracma::cumtrapz`; https://en.wikipedia.org/wiki/Trapezoidal_rule' ) |> matrix(nrow = 3L, dimnames = list(c('Term / Abbreviation', 'Description', 'Reference'), NULL)) |> t.default() |> as.data.frame.matrix() |> kable() ``` # `groupedHyperframe` Class The `S3` class `groupedHyperframe` `inherits` from `hyperframe` class, in a similar fashion as `groupedData` class inherits from `data.frame` class. A `groupedHyperframe` object, in addition to `hyperframe` object, has attribute(s) - `attr(., 'group')`, a `formula` to specify the grouping structure ## `groupedHyperframe` with `ppp`-hypercolumn Function `grouped_ppp()` creates a `groupedHyperframe` with ***one-and-only-one*** `ppp`-hypercolumn. Multiple `ppp`-hypercolumns will *not* be supported in foreseeable future, as we would need to check for name clash in `$marks` from the multiple `ppp`-hypercolumns, which is too much trouble. In the following example, the argument `formula` specifies - the marks, e.g., `numeric` mark *`hladr`* and `multitype` mark *`phenotype`*, on the left-hand-side - additional predictors and/or endpoints, e.g., *`OS`*, *`gender`* and *`age`*, before the `|` separator on the right-hand-side - grouping structure, e.g., *`image_id`* nested in *`patient_id`*, after the `|` separator on the right-hand-side. ```{r} (s = grouped_ppp(formula = hladr + phenotype ~ OS + gender + age | patient_id/image_id, data = wrobel_lung, mc.cores = 1L)) ``` Function `grouped_ppp()` has parameter `coords` which specifies the column name of $x$- and $y$-coordinates in the input `data`. Default `coords = ~ x + y` indicates the use of `data$x` and `data$y` for $x$- and $y$-coordinates, respectively. Users may use `coords = FALSE` for data without $x$- and $y$-coordinates. In this case, the coordinates are filled with randomly generated numbers, and the returned `groupedHyperframe` has a `pseudo.ppp`-hypercolumn. ```{r} (s_a = grouped_ppp(Ki67 ~ Surv(recfreesurv_mon, recurrence) + race + age | patientID/tissueID, data = Ki67, coords = FALSE, mc.cores = 1L)) ``` # Batch Process on `ppp`-Hypercolumn In this section, we outline the batch process of spatial point pattern analyses applicable to the `ppp`-hypercolumn of a `hyperframe`. Note that these spatial point pattern analyses should **not** be applied to a `pseudo.ppp`-hypercolumn, as the $x$- and $y$-coordinates are randomly generated psuedo numbers. Batch processes that add a `fv`-hypercolumn to the input `hyperframe` include ```{r echo = FALSE, results = 'asis'} c( '`Emark_()`', '`spatstat.explore::Emark`', '`numeric` marks (e.g., *`hladr`*) in `ppp`-hypercolumn', '`Vmark_()`', '`spatstat.explore::Vmark`', '`numeric` marks', '`markcorr_()`', '`spatstat.explore::markcorr`', '`numeric` marks', '`markvario_()`', '`spatstat.explore::markvario`', '`numeric` marks', '`Gcross_()`', '`spatstat.explore::Gcross`', '`multitype` marks (e.g., *`phenotype`*)', '`Kcross_()`', '`spatstat.explore::Kcross`', '`multitype` marks', '`Jcross_()`', '`spatstat.explore::Jcross`', '`multitype` marks' ) |> matrix(nrow = 3L, dimnames = list(c('Function', 'Workhorse', 'Applicable To'), NULL)) |> t.default() |> as.data.frame.matrix() |> kable(caption = 'Batch process that adds an `fv`-hypercolumn') ``` Batch processes that add a `numeric`-hypercolumn to the input `hyperframe` include ```{r echo = FALSE, results = 'asis'} c( '`nncross_()`', '`spatstat.geom::nncross.ppp(., what = \'dist\')`', '`multitype` marks (e.g., *`phenotype`*)' ) |> matrix(nrow = 3L, dimnames = list(c('Function', 'Workhorse', 'Applicable To'), NULL)) |> t.default() |> as.data.frame.matrix() |> kable(caption = 'Batch process that adds a `numeric`-hypercolumn') ``` Following example shows that multiple batch processes may be applied to a `hyperframe` (or `groupedHyperframe`) in a pipeline (`|>`). ```{r} r = seq.int(from = 0, to = 250, by = 10) out = s |> Emark_(r = r, correction = 'best', mc.cores = 1L) |> # slow # Vmark_(r = r, correction = 'best', mc.cores = 1L) |> # slow # markcorr_(r = r, correction = 'best', mc.cores = 1L) |> # slow # markvario_(r = r, correction = 'best', mc.cores = 1L) |> # slow Gcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast # Kcross_(i = 'CK+.CD8-', j = 'CK-.CD8+', r = r, correction = 'best', mc.cores = 1L) |> # fast nncross_(i = 'CK+.CD8-', j = 'CK-.CD8+', correction = 'best', mc.cores = 1L) # fast ``` The returned `hyperframe` (or `groupedHyperframe`) has - `fv`-hypercolumn *`hladr.E`*, created by function `Emark_()` on `numeric` mark *`hladr`* - `fv`-hypercolumn *`phenotype.G`*, created by function `Gcross_()` on `multitype` mark *`phenotype`* - `numeric`-hypercolumn *`phenotype.nncross`*, created by function `nncross_()` on `multitype` mark *`phenotype`* ```{r} out ``` # Aggregation Over Nested Grouping Structure When nested grouping structure `~g1/g2/.../gm` is present, we may aggregate over the - `fv`-hypercolumn(s) - `numeric`-hypercolumn(s) - `numeric` marks in the `ppp`-hypercolumn by either one of the grouping levels `~g1`, `~g2`, ..., or `~gm`. If the lowest grouping `~gm` is specified, then no aggregation is performed. The returned object of various aggregation functions, `aggregate_fv()`, `aggregate_quantile()` and `aggregate_kerndens()`, is `data.frame` instead of `hyperframe`. This is because the aggregated results are stored in `matrix`-columns, while the `hyperframe` class does not support `matrix`-column. ## Aggregation of `fv`-hypercolumn(s) Function `aggregate_fv()` aggregates - the **function values**, i.e., the black-solid-curve of `spatstat.explore::plot.fv`. In the following example, we have - `matrix`-column *`hladr.E.value`*, aggregated function value from `fv`-hypercolumn *`hladr.E`* - `matrix`-column *`phenotype.G.value`*, aggregated function value from `fv`-hypercolumn *`phenotype.G`* - the **cumulative trapezoid area** under the black-solid-curve. In the following example, we have - `matrix`-column *`hladr.E.cumtrapz`*, aggregated cumulative trapezoid area from `fv`-hypercolumn *`hladr.E`* - `matrix`-column *`phenotype.G.cumtrapz`*, aggregated cumulative trapezoid area from `fv`-hypercolumn *`phenotype.G`* ```{r} afv = out |> aggregate_fv(by = ~ patient_id, f_aggr_ = 'mean', mc.cores = 1L) nrow(afv) # number of patients names(afv) dim(afv$hladr.E.cumtrapz) # N(patient) by length(r) ``` ## Aggregation of `numeric`-hypercolumn(s) and `numeric` mark(s) in `ppp`-hypercolumn Function `aggregate_quantile()` aggregates - the quantile of the `numeric`-hypercolumn(s). In the following example, we have - `matrix`-column *`phenotype.nncross.quantile`*, aggregated quantile of `numeric`-hypercolumn *`phenotype.nncross`* - the quantile of the `numeric` mark(s) in the `ppp`-hypercolumn. In the following example, we have - `matrix`-column *`hladr.quantile`*, aggregated quantile of `numeric` mark *`hladr`* in `ppp`-hypercolumn ```{r} q = out |> aggregate_quantile(by = ~ patient_id, probs = seq.int(from = 0, to = 1, by = .1), mc.cores = 1L) nrow(q) names(q) dim(q$phenotype.nncross.quantile) dim(q$hladr.quantile) ``` Function `aggregate_kerndens()` aggregates - the kernel density of the `numeric`-hypercolumn(s). In the following example, we have - `matrix`-column *`phenotype.nncross.kerndens`*, aggregated kernel density of `numeric`-hypercolumn *`phenotype.nncross`* - the kernel density of the `numeric` mark(s) in the `ppp`-hypercolumn. In the following example, we have - `matrix`-column *`hladr.kerndens`*, aggregated kernel density of `numeric` mark *`hladr`* in `ppp`-hypercolumn ```{r} (mdist = out$phenotype.nncross |> unlist() |> max()) d = out |> aggregate_kerndens(by = ~ patient_id, from = 0, to = mdist, mc.cores = 1L) nrow(d) names(d) dim(d$phenotype.nncross.kerndens) ```