Welcome to ClientVPS Mirrors

Creating synthetic clinical tables

Creating synthetic clinical tables

The omock package provides functionality to quickly create a cdm reference containing synthetic data based on population settings specified by the user.

First, let’s load packages required for this vignette.

library(omock)
library(dplyr)
library(ggplot2)

Now, in three lines of code, we can create a cdm reference with a person and observation period table for 1000 people.

cdm <- emptyCdmReference(cdmName = "synthetic cdm") |>
  mockPerson(nPerson = 1000) |>
  mockObservationPeriod()

cdm
#> 
#> ── # OMOP CDM reference (local) of synthetic cdm ───────────────────────────────
#> • omop tables: observation_period, person
#> • cohort tables: -
#> • achilles tables: -
#> • other tables: -

cdm$person |> glimpse()
#> Rows: 1,000
#> Columns: 18
#> $ person_id                   <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,…
#> $ gender_concept_id           <int> 8532, 8507, 8507, 8507, 8507, 8507, 8532, …
#> $ year_of_birth               <int> 1952, 1999, 1992, 1966, 1954, 1980, 1997, …
#> $ month_of_birth              <int> 3, 5, 10, 5, 1, 11, 6, 1, 7, 2, 6, 11, 5, …
#> $ day_of_birth                <int> 13, 14, 5, 26, 31, 27, 21, 28, 16, 10, 28,…
#> $ race_concept_id             <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_concept_id        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ birth_datetime              <dttm> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
#> $ location_id                 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ provider_id                 <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ care_site_id                <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ person_source_value         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ gender_source_value         <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ gender_source_concept_id    <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ race_source_value           <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ race_source_concept_id      <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_source_value      <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
#> $ ethnicity_source_concept_id <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…

cdm$observation_period |> glimpse()
#> Rows: 1,000
#> Columns: 5
#> $ person_id                     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
#> $ observation_period_start_date <date> 1995-11-15, 2019-12-20, 2009-02-18, 201…
#> $ observation_period_end_date   <date> 2016-05-12, 2019-12-30, 2011-09-12, 201…
#> $ observation_period_id         <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1…
#> $ period_type_concept_id        <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…

We can add further requirements around the population we create. For example we can require that they were born between 1960 and 1980 like so.

cdm <- emptyCdmReference(cdmName = "synthetic cdm") |>
  mockPerson(
    nPerson = 1000,
    birthRange = as.Date(c("1960-01-01", "1980-12-31"))
  ) |>
  mockObservationPeriod()
cdm$person |>
  collect() |>
  ggplot() +
  geom_histogram(aes(as.integer(year_of_birth)),
    binwidth = 1, colour = "grey"
  ) +
  theme_minimal() +
  xlab("Year of birth")

Need a high-speed mirror for your open-source project?
Contact our mirror admin team at info@clientvps.com.

This archive is provided as a free public service to the community.
Proudly supported by infrastructure from VPSPulse , RxServers , BuyNumber , UnitVPS , OffshoreName and secure payment technology by ArionPay.