tidy

The package provides functionalities to tidy a summarised result to obtain a dataframe with which is easier to do subsequent calculations.

In this line, the split functions, described in split and unite functions allow to interact with name-level columns.

For the estimates, we have the pivotEstimates function, and for the settings addSettings. Finally the tidy method accommodates the split and pivot functionalities in the same function.

Estimates

First, let’s load relevant libraries and create a mock summarised result table.

library(visOmopResults)
library(dplyr)
result <- mockSummarisedResult()
result |> glimpse()
#> Rows: 126
#> Columns: 13
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ group_name       <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level      <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name      <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level     <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name    <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "9337847", "4006478", "2868369", "7818476", "9065176"…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…

The function pivotEstimates adds columns containing the estimates values for each combination of columns in pivotEstimatesBy. For instance, in the following example we use the columns variable_name, variable_level, and estimate_name to pivot the estimates.

result |> 
  pivotEstimates(pivotEstimatesBy = c("variable_name", "variable_level", "estimate_name")) |>
  glimpse()
#> Rows: 18
#> Columns: 15
#> $ result_id                          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name                           <chr> "mock", "mock", "mock", "mock", "mo…
#> $ group_name                         <chr> "cohort_name", "cohort_name", "coho…
#> $ group_level                        <chr> "cohort1", "cohort1", "cohort1", "c…
#> $ strata_name                        <chr> "overall", "age_group &&& sex", "ag…
#> $ strata_level                       <chr> "overall", "<40 &&& Male", ">=40 &&…
#> $ additional_name                    <chr> "overall", "overall", "overall", "o…
#> $ additional_level                   <chr> "overall", "overall", "overall", "o…
#> $ `number subjects_NA_count`         <int> 9337847, 4006478, 2868369, 7818476,…
#> $ age_NA_mean                        <dbl> 30.49621, 27.51317, 19.64153, 84.40…
#> $ age_NA_sd                          <dbl> 3.3287556, 4.6797953, 3.8420378, 7.…
#> $ Medications_Amoxiciline_count      <int> 21944, 70846, 27309, 44353, 34557, …
#> $ Medications_Amoxiciline_percentage <dbl> 12.759029, 81.434293, 99.356778, 49…
#> $ Medications_Ibuprofen_count        <int> 2795, 1362, 94596, 12537, 66965, 25…
#> $ Medications_Ibuprofen_percentage   <dbl> 30.713166, 8.628628, 59.166925, 83.…

The argument nameStyle is to customise the names of the new columns. It uses the glue package syntax. For instance:

result |> 
  pivotEstimates(pivotEstimatesBy = "estimate_name",
                 nameStyle = "{toupper(estimate_name)}") |>
  glimpse()
#> Rows: 72
#> Columns: 14
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ group_name       <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level      <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name      <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level     <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name    <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ COUNT            <int> 9337847, 4006478, 2868369, 7818476, 9065176, 2211710,…
#> $ MEAN             <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ SD               <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ PERCENTAGE       <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…

Settings

The function addSettings adds a new column for each of the settings in the summarised result, if any:

mockSummarisedResult() |>
  addSettings() |>
  glimpse()
#> Rows: 126
#> Columns: 16
#> $ result_id        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
#> $ cdm_name         <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock…
#> $ group_name       <chr> "cohort_name", "cohort_name", "cohort_name", "cohort_…
#> $ group_level      <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1"…
#> $ strata_name      <chr> "overall", "age_group &&& sex", "age_group &&& sex", …
#> $ strata_level     <chr> "overall", "<40 &&& Male", ">=40 &&& Male", "<40 &&& …
#> $ variable_name    <chr> "number subjects", "number subjects", "number subject…
#> $ variable_level   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ estimate_name    <chr> "count", "count", "count", "count", "count", "count",…
#> $ estimate_type    <chr> "integer", "integer", "integer", "integer", "integer"…
#> $ estimate_value   <chr> "2703410", "3101646", "4285343", "2451643", "6496595"…
#> $ additional_name  <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ additional_level <chr> "overall", "overall", "overall", "overall", "overall"…
#> $ result_type      <chr> "mock_summarised_result", "mock_summarised_result", "…
#> $ package_name     <chr> "visOmopResults", "visOmopResults", "visOmopResults",…
#> $ package_version  <chr> "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0",…

Tidy

Finally, the method tidy incorporates the splitting pf name-level columns and pivotting of estimates and settings. By default, it splits group, strata and additional, pivots estimates by the columns “estimate_name” and also pivots the settings.

result <- mockSummarisedResult()

result |> 
  tidy() |> 
  glimpse()
#> Rows: 72
#> Columns: 14
#> $ result_id       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
#> $ cdm_name        <chr> "mock", "mock", "mock", "mock", "mock", "mock", "mock"…
#> $ cohort_name     <chr> "cohort1", "cohort1", "cohort1", "cohort1", "cohort1",…
#> $ age_group       <chr> "overall", "<40", ">=40", "<40", ">=40", "overall", "o…
#> $ sex             <chr> "overall", "Male", "Male", "Female", "Female", "Male",…
#> $ variable_name   <chr> "number subjects", "number subjects", "number subjects…
#> $ variable_level  <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ count           <int> 3397666, 5378334, 1665180, 7493291, 1764428, 6818035, …
#> $ mean            <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ sd              <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ percentage      <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ result_type     <chr> "mock_summarised_result", "mock_summarised_result", "m…
#> $ package_name    <chr> "visOmopResults", "visOmopResults", "visOmopResults", …
#> $ package_version <chr> "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0", "0.3.0", …

Which column pairs to split can be customised with the split arguments, while pivotEstimatesBy and nameStyle are for pivotting estimates. If pivotEstimatesBy is NULL or character(), estimates will not be modified. Settings will always be pivotted if present.