---
title: "Usage guidance"
date: "`r Sys.Date()`"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Usage guidance}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
```

## Introduction

`DescrTab2` is the replacement of the `DescrTab` package.
It supports a variety of different customization options and can be used
in .Rmd files in conjunction with knitr.

## Preamble settings
`DescrTab2` works in your R-console, as well as in `.Rmd` documents corresponding to
output formats of the type `pdf_documument`, `html_document` and `word_document`.
It even supports YAML-headers with multiple output formats!
For example, if your YAML-header looks like the example below, `DescrTab2` should automagically detect the output format
depending on the rendering option you choose from the dropdown menue (the arrow next to the "Knit" button on the top menue bar).



````markdown
---
title: "DescrTab2 tutorial"
output:
  word_document: default
  pdf_document: default
  html_document: default
---
````

Required LaTeX packages should be loaded automatically as well
when rendering as a pdf.

## Getting started
Make sure you include the DescrTab2 library by typing 

```{r, echo=TRUE}
library(DescrTab2)
```
somewhere in the document before you use it. You are now ready to go!


We will use two `tidyverse` libraries for data manipulation and a the following
dataset for instructive purposes:
```{r}
library(dplyr, warn.conflicts = FALSE)
library(forcats)
set.seed(123)
dat <- iris[, c("Species", "Sepal.Length")] %>%
  mutate(animal = c("Mammal", "Fish") %>% rep(75) %>% factor()) %>%
  mutate(food = c("fries", "wedges") %>% sample(150, TRUE) %>% factor())
head(dat)
```


Producing beautiful descriptive tables is now as easy as typing:

```{r}
descr(dat)
```


## Accessing table elements

The object returned from the `descr` function is basically just a named list. You may be interested in referencing certain summary statistics from the table in your document. To do this, you can save the list returned by `descr`:

```{r}
my_table <- descr(dat)
```

You can then access the elements of the list using the `$` operator.

```{r}
my_table$variables$Sepal.Length$results$Total$mean
```

Rstudios autocomplete suggestions are very helpful when navigating this list.

The `print` function returns a formatted version of this list, which you can also save and access using the same syntax.

```{r}
my_table <- descr(dat) %>% print(silent=TRUE)
```


## Specifying a group
Use the `group` option to specify the name of a grouping variable in your data:

```{r}
descr(dat, "Species")
```

## Assigning labels
Use the `group_labels` option to assign group labels and the `var_labels` option to assign variable labels:

```{r}
descr(dat, "Species", group_labels=list(setosa="My custom group label"), var_labels = list(Sepal.Length = "My custom variable label"))
```

## Assigning a table caption
Use the `caption` member of the `format_options` argument to assign a table caption:

```{r}
descr(dat, "Species", format_options = list(caption="Description of our example dataset."))
```


## Confidence intervals for two group comparisons
For 2-group comparisons, decrtab automatically calculates confidence intervals for differences in effect measures:

```{r}
descr(dat, "animal")
```

## Different tests
There are a lot of different tests available. Check out the test_choice vignette for details: https://imbi-heidelberg.github.io/DescrTab2/articles/b_test_choice_tree_pdf.pdf

Here are some different tests in action:

```{r}
descr(dat %>% select(-"Species"), "animal", test_options = list(exact=TRUE, nonparametric=TRUE))
```


```{r}
descr(dat %>% select(c("Species", "Sepal.Length")), "Species", test_options = list(nonparametric=TRUE))
```

## Paired observations

In situations with paired data, the `group` variable usually denotes the timing of the measurement (e.g. "before" and "after" or "time 1", "time 2", etc.). In these scenarios, you need an additional index variable that specifies which observations from the different timepoints should be paired. The `test_options =list(paired=TRUE, indices = <Character name of index variable name or vector of indices>)` option can be used to specify the pairing indices, see the example below. DescrTab2 only works with data in "long format", see e.g. `?reshape` or `?tidyr::pivot_longer` for information on how to transoform your data from wide to long format.

```{r}
descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal")) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices=rep(1:75, each=2)))

descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal"), idx = rep(1:75, each=2)) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices="idx" ))
```

## Significant digits

Every summary statistic in DescrTab2 is formatted by a corresponding formatting function. You can exchange these formatting functions as you please:

```{r}
descr(dat, "Species", format_summary_stats = list(mean=function(x)formatC(x, digits = 4)) )
```


## Omitting summary statistics

Let's say you don't want to calculate quantiles for your numeric variables. You can specify the `summary_stats_cont` option to include all summary statistics but quantiles:

```{r}
descr(dat, "Species", summary_stats_cont = list(N = DescrTab2:::.N, Nmiss = DescrTab2:::.Nmiss, mean =
    DescrTab2:::.mean, sd = DescrTab2:::.sd, median = DescrTab2:::.median, min = DescrTab2:::.min, max =
    DescrTab2:::.max))
```

## Adding summary statistics
Let's say you have a categorical variable, but for some reason it's levels are numerals and you want to calculate the mean. No problem:

```{r}
# Create example dataset
dat2 <- iris
dat2$cat_var <- c(1,2) %>% sample(150, TRUE) %>% factor()
dat2 <- dat2[, c("Species", "cat_var")]

descr(dat2, "Species", summary_stats_cat=list(mean=DescrTab2:::.factormean))

```

## Combining mean and sd
Use the `format_options = list(combine_mean_sd=TRUE)` option:

```{r}
descr(dat, "Species", format_options = c(combine_mean_sd=TRUE))
```


## Omitting p-values
You can declare the `format_options = list(print_p = FALSE)` option to omit p-values:

```{r}
descr(dat, "animal", format_options = list(print_p = FALSE))
```

Similarily for Confidence intervals:

```{r}
descr(dat, "animal", format_options = list(print_CI = FALSE))
```


## Controling options on a per-variable level
You can use the `var_options` list to control formatting and test options on a per-variable basis.
Let's say in the dataset `iris`, we want that only the `Sepal.Length` variable has more digits in the mean and a nonparametric test:


```{r}
descr(iris, "Species", var_options = list(Sepal.Length = list(
  format_summary_stats = list(
    mean = function(x)
      formatC(x, digits = 4)
  ),
  test_options = c(nonparametric = TRUE)
)))
```

## Use user defined test statistics
`DescrTab2` has many predefined significance tests, but sometimes you may need to use a custom test.
In this case, you can use the test_override option in test_options (or as a part of per 
variable options, see above). To do so, test_override must be a list with at least 3 members:

 - the name of the test (e.g. "custom t-test")
 - an abbreviation for this name (e.g. "ct")
 - a function that takes the variable to be analysed and possibly a group variable and returns the p-value.
```{r}
custom_ttest <- list(
  name = "custom t-test",
  abbreviation = "ct",
  p = function(var) {
    return(t.test(var, alternative = "greater")$p.value)
  }
)

descr(iris %>% select(-Species), test_options = list(test_override = custom_ttest))
```

## Supress the last factor level
If you have a lot of binary factors, you may want to suppress one of the factor levels to save space.
A common use case for this practise is when you analyse questionaires with a great deal of "yes" / "no" items.
You can do so by setting the `omit_factor_level` option to either `"first"` or `"last"`.

```{r}
descr(factor(c("a", "b")), format_options=list(omit_factor_level = "last"))
```

## Confidence intervals as summary statistics
Sometimes it may be a good idea to show the confidence intervals as summary statistics.
To do so, you can supply appropriate summary statistics for the confidence intervals with corresponding formatting functions.
`DescrTab2` offers the following predefined CI functions:

- `DescrTab2:::.meanCIlower`
- `DescrTab2:::.meanCIupper`
- `DescrTab2:::.factor_firstlevel_CIlower`
- `DescrTab2:::.factor_firstlevel_CIupper`
- `DescrTab2:::.factor_lastlevel_CIlower`
- `DescrTab2:::.factor_lastlevel_CIupper`
- `DescrTab2:::.HLCIlower`
- `DescrTab2:::.HLCIupper`

```{r}
summary_stats_cat <- list(
  CIL = DescrTab2:::.factor_firstlevel_CIlower,
  CIU = DescrTab2:::.factor_firstlevel_CIupper)

summary_stats_cont  <-  list(
  N = DescrTab2:::.N,
  Nmiss = DescrTab2:::.Nmiss,
  mean = DescrTab2:::.mean,
  sd = DescrTab2:::.sd,
  CILM = DescrTab2:::.meanCIlower,
  CIUM = DescrTab2:::.meanCIupper)

format_summary_stats <- list(
  CIL = scales::label_percent(),
  CIU = scales::label_percent(),
  CILM = function(x) format(x, digits = 2, scientific = 3),
  CIUM = function(x) format(x, digits = 2, scientific = 3),
  N = function(x) {
    format(x, digits = 2, scientific = 3)
  },
  mean = function(x) {
    format(x, digits = 2, scientific = 3)
  },
  sd = function(x) {
    format(x, digits = 2, scientific = 3)
  },
  CI = function(x) {
    format(x, digits = 2, scientific = 3)
  }
)

reshape_rows <- list(
  `CI` = list(
    args = c("CIL", "CIU"),
    fun = function(CIL, CIU) {
      paste0("[", CIL, ", ", CIU, "]")
    }
  ),
  `CI` = list(
    args = c("CILM", "CIUM"),
    fun = function(CILM, CIUM) {
      paste0("[", CILM, ", ", CIUM, "]")
    }
  )
)

set.seed(123)
dat <- tibble(a_factor = factor(c(rep("a", 70), rep("b", 30))),
              a_numeric = rnorm(100),
              group = sample(c("Trt", "Ctrl"), 100, TRUE)
)

descr(dat, "group",
  format_options=list(omit_factor_level = "last",
  categories_first_summary_stats_second = FALSE,
  combine_mean_sd = TRUE
  ),
  summary_stats_cat = summary_stats_cat,
  summary_stats_cont = summary_stats_cont,
  reshape_rows = reshape_rows,
  format_summary_stats = format_summary_stats)
```