--- title: "Usage guidance" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Usage guidance} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Introduction `DescrTab2` is the replacement of the `DescrTab` package. It supports a variety of different customization options and can be used in .Rmd files in conjunction with knitr. ## Preamble settings `DescrTab2` works in your R-console, as well as in `.Rmd` documents corresponding to output formats of the type `pdf_documument`, `html_document` and `word_document`. It even supports YAML-headers with multiple output formats! For example, if your YAML-header looks like the example below, `DescrTab2` should automagically detect the output format depending on the rendering option you choose from the dropdown menue (the arrow next to the "Knit" button on the top menue bar). ````markdown --- title: "DescrTab2 tutorial" output: word_document: default pdf_document: default html_document: default --- ```` Required LaTeX packages should be loaded automatically as well when rendering as a pdf. ## Getting started Make sure you include the DescrTab2 library by typing ```{r, echo=TRUE} library(DescrTab2) ``` somewhere in the document before you use it. You are now ready to go! We will use two `tidyverse` libraries for data manipulation and a the following dataset for instructive purposes: ```{r} library(dplyr, warn.conflicts = FALSE) library(forcats) set.seed(123) dat <- iris[, c("Species", "Sepal.Length")] %>% mutate(animal = c("Mammal", "Fish") %>% rep(75) %>% factor()) %>% mutate(food = c("fries", "wedges") %>% sample(150, TRUE) %>% factor()) head(dat) ``` Producing beautiful descriptive tables is now as easy as typing: ```{r} descr(dat) ``` ## Accessing table elements The object returned from the `descr` function is basically just a named list. You may be interested in referencing certain summary statistics from the table in your document. To do this, you can save the list returned by `descr`: ```{r} my_table <- descr(dat) ``` You can then access the elements of the list using the `$` operator. ```{r} my_table$variables$Sepal.Length$results$Total$mean ``` Rstudios autocomplete suggestions are very helpful when navigating this list. The `print` function returns a formatted version of this list, which you can also save and access using the same syntax. ```{r} my_table <- descr(dat) %>% print(silent=TRUE) ``` ## Specifying a group Use the `group` option to specify the name of a grouping variable in your data: ```{r} descr(dat, "Species") ``` ## Assigning labels Use the `group_labels` option to assign group labels and the `var_labels` option to assign variable labels: ```{r} descr(dat, "Species", group_labels=list(setosa="My custom group label"), var_labels = list(Sepal.Length = "My custom variable label")) ``` ## Assigning a table caption Use the `caption` member of the `format_options` argument to assign a table caption: ```{r} descr(dat, "Species", format_options = list(caption="Description of our example dataset.")) ``` ## Confidence intervals for two group comparisons For 2-group comparisons, decrtab automatically calculates confidence intervals for differences in effect measures: ```{r} descr(dat, "animal") ``` ## Different tests There are a lot of different tests available. Check out the test_choice vignette for details: https://imbi-heidelberg.github.io/DescrTab2/articles/b_test_choice_tree_pdf.pdf Here are some different tests in action: ```{r} descr(dat %>% select(-"Species"), "animal", test_options = list(exact=TRUE, nonparametric=TRUE)) ``` ```{r} descr(dat %>% select(c("Species", "Sepal.Length")), "Species", test_options = list(nonparametric=TRUE)) ``` ## Paired observations In situations with paired data, the `group` variable usually denotes the timing of the measurement (e.g. "before" and "after" or "time 1", "time 2", etc.). In these scenarios, you need an additional index variable that specifies which observations from the different timepoints should be paired. The `test_options =list(paired=TRUE, indices = )` option can be used to specify the pairing indices, see the example below. DescrTab2 only works with data in "long format", see e.g. `?reshape` or `?tidyr::pivot_longer` for information on how to transoform your data from wide to long format. ```{r} descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal")) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices=rep(1:75, each=2))) descr(dat %>% mutate(animal = fct_recode(animal, Before="Fish", After="Mammal"), idx = rep(1:75, each=2)) %>% select(-"Species"), "animal", test_options = list(paired=TRUE, indices="idx" )) ``` ## Significant digits Every summary statistic in DescrTab2 is formatted by a corresponding formatting function. You can exchange these formatting functions as you please: ```{r} descr(dat, "Species", format_summary_stats = list(mean=function(x)formatC(x, digits = 4)) ) ``` ## Omitting summary statistics Let's say you don't want to calculate quantiles for your numeric variables. You can specify the `summary_stats_cont` option to include all summary statistics but quantiles: ```{r} descr(dat, "Species", summary_stats_cont = list(N = DescrTab2:::.N, Nmiss = DescrTab2:::.Nmiss, mean = DescrTab2:::.mean, sd = DescrTab2:::.sd, median = DescrTab2:::.median, min = DescrTab2:::.min, max = DescrTab2:::.max)) ``` ## Adding summary statistics Let's say you have a categorical variable, but for some reason it's levels are numerals and you want to calculate the mean. No problem: ```{r} # Create example dataset dat2 <- iris dat2$cat_var <- c(1,2) %>% sample(150, TRUE) %>% factor() dat2 <- dat2[, c("Species", "cat_var")] descr(dat2, "Species", summary_stats_cat=list(mean=DescrTab2:::.factormean)) ``` ## Combining mean and sd Use the `format_options = list(combine_mean_sd=TRUE)` option: ```{r} descr(dat, "Species", format_options = c(combine_mean_sd=TRUE)) ``` ## Omitting p-values You can declare the `format_options = list(print_p = FALSE)` option to omit p-values: ```{r} descr(dat, "animal", format_options = list(print_p = FALSE)) ``` Similarily for Confidence intervals: ```{r} descr(dat, "animal", format_options = list(print_CI = FALSE)) ``` ## Controling options on a per-variable level You can use the `var_options` list to control formatting and test options on a per-variable basis. Let's say in the dataset `iris`, we want that only the `Sepal.Length` variable has more digits in the mean and a nonparametric test: ```{r} descr(iris, "Species", var_options = list(Sepal.Length = list( format_summary_stats = list( mean = function(x) formatC(x, digits = 4) ), test_options = c(nonparametric = TRUE) ))) ``` ## Use user defined test statistics `DescrTab2` has many predefined significance tests, but sometimes you may need to use a custom test. In this case, you can use the test_override option in test_options (or as a part of per variable options, see above). To do so, test_override must be a list with at least 3 members: - the name of the test (e.g. "custom t-test") - an abbreviation for this name (e.g. "ct") - a function that takes the variable to be analysed and possibly a group variable and returns the p-value. ```{r} custom_ttest <- list( name = "custom t-test", abbreviation = "ct", p = function(var) { return(t.test(var, alternative = "greater")$p.value) } ) descr(iris %>% select(-Species), test_options = list(test_override = custom_ttest)) ``` ## Supress the last factor level If you have a lot of binary factors, you may want to suppress one of the factor levels to save space. A common use case for this practise is when you analyse questionaires with a great deal of "yes" / "no" items. You can do so by setting the `omit_factor_level` option to either `"first"` or `"last"`. ```{r} descr(factor(c("a", "b")), format_options=list(omit_factor_level = "last")) ``` ## Confidence intervals as summary statistics Sometimes it may be a good idea to show the confidence intervals as summary statistics. To do so, you can supply appropriate summary statistics for the confidence intervals with corresponding formatting functions. `DescrTab2` offers the following predefined CI functions: - `DescrTab2:::.meanCIlower` - `DescrTab2:::.meanCIupper` - `DescrTab2:::.factor_firstlevel_CIlower` - `DescrTab2:::.factor_firstlevel_CIupper` - `DescrTab2:::.factor_lastlevel_CIlower` - `DescrTab2:::.factor_lastlevel_CIupper` - `DescrTab2:::.HLCIlower` - `DescrTab2:::.HLCIupper` ```{r} summary_stats_cat <- list( CIL = DescrTab2:::.factor_firstlevel_CIlower, CIU = DescrTab2:::.factor_firstlevel_CIupper) summary_stats_cont <- list( N = DescrTab2:::.N, Nmiss = DescrTab2:::.Nmiss, mean = DescrTab2:::.mean, sd = DescrTab2:::.sd, CILM = DescrTab2:::.meanCIlower, CIUM = DescrTab2:::.meanCIupper) format_summary_stats <- list( CIL = scales::label_percent(), CIU = scales::label_percent(), CILM = function(x) format(x, digits = 2, scientific = 3), CIUM = function(x) format(x, digits = 2, scientific = 3), N = function(x) { format(x, digits = 2, scientific = 3) }, mean = function(x) { format(x, digits = 2, scientific = 3) }, sd = function(x) { format(x, digits = 2, scientific = 3) }, CI = function(x) { format(x, digits = 2, scientific = 3) } ) reshape_rows <- list( `CI` = list( args = c("CIL", "CIU"), fun = function(CIL, CIU) { paste0("[", CIL, ", ", CIU, "]") } ), `CI` = list( args = c("CILM", "CIUM"), fun = function(CILM, CIUM) { paste0("[", CILM, ", ", CIUM, "]") } ) ) set.seed(123) dat <- tibble(a_factor = factor(c(rep("a", 70), rep("b", 30))), a_numeric = rnorm(100), group = sample(c("Trt", "Ctrl"), 100, TRUE) ) descr(dat, "group", format_options=list(omit_factor_level = "last", categories_first_summary_stats_second = FALSE, combine_mean_sd = TRUE ), summary_stats_cat = summary_stats_cat, summary_stats_cont = summary_stats_cont, reshape_rows = reshape_rows, format_summary_stats = format_summary_stats) ```