Introduction to CodelistGenerator

Creating a code list for dementia

For this example we are going to generate a candidate codelist for dementia, only looking for codes in the condition domain. Let’s first load some libraries

Connect to the OMOP CDM vocabularies

CodelistGenerator works with a cdm_reference to the vocabularies tables of the OMOP CDM using the CDMConnector package.

# example with postgres database connection details
db <- DBI::dbConnect(RPostgres::Postgres(),
  dbname = Sys.getenv("server"),
  port = Sys.getenv("port"),
  host = Sys.getenv("host"),
  user = Sys.getenv("user"),
  password = Sys.getenv("password")
)

# create cdm reference
cdm <- CDMConnector::cdmFromCon(
  con = db,
  cdmSchema = Sys.getenv("vocabulary_schema")
)

Check version of the vocabularies

It is important to note that the results from CodelistGenerator will be specific to a particular version of the OMOP CDM vocabularies. We can see the version of the vocabulary being used like so

getVocabVersion(cdm = cdm)
#> [1] "vocabVersion"

A code list from “Dementia” (4182210) and its descendants

The simplest approach to identifying potential codes is to take a high-level code and include all its descendants.

codesFromDescendants <- tbl(
  db,
  sql(paste0(
    "SELECT * FROM ",
    vocabularyDatabaseSchema,
    ".concept_ancestor"
  ))
) |>
  filter(ancestor_concept_id == "4182210") |>
  select("descendant_concept_id") |>
  rename("concept_id" = "descendant_concept_id") |>
  left_join(tbl(db, sql(paste0(
    "SELECT * FROM ",
    vocabularyDatabaseSchema,
    ".concept"
  )))) |>
  select(
    "concept_id", "concept_name",
    "domain_id", "vocabulary_id"
  ) |>
  collect()
codesFromDescendants |> 
  glimpse()
#> Rows: 151
#> Columns: 4
#> $ concept_id    <int> 35610098, 4043241, 4139421, 37116466, 4046089, 44782559,…
#> $ concept_name  <chr> "Predominantly cortical dementia", "Familial Alzheimer's…
#> $ domain_id     <chr> "Condition", "Condition", "Condition", "Condition", "Con…
#> $ vocabulary_id <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOME…

This looks to pick up most relevant codes. But, this approach misses codes that are not a descendant of 4182210. For example, codes such as “Wandering due to dementia” (37312577; https://athena.ohdsi.org/search-terms/terms/37312577) and “Anxiety due to dementia” (37312031; https://athena.ohdsi.org/search-terms/terms/37312031) are not picked up.

Generating a candidate code list using CodelistGenerator

To try and include all such terms that could be included we can use CodelistGenerator.

First, let’s do a simple search for a single keyword of “dementia”, including descendants of the identified codes.

dementiaCodes1 <- getCandidateCodes(
  cdm = cdm,
  keywords = "dementia",
  domains = "Condition",
  includeDescendants = TRUE
)
dementiaCodes1|> 
  glimpse()
#> Rows: 187
#> Columns: 6
#> $ concept_id       <int> 374326, 374888, 375791, 376085, 376094, 376095, 37694…
#> $ found_from       <chr> "From initial search", "From initial search", "From i…
#> $ concept_name     <chr> "Arteriosclerotic dementia with depression", "Dementi…
#> $ domain_id        <chr> "Condition", "Condition", "Condition", "Condition", "…
#> $ vocabulary_id    <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SN…
#> $ standard_concept <chr> "standard", "standard", "standard", "standard", "stan…

Comparing code lists

What is the difference between this code list and the one from 4182210 and its descendants?

codeComparison <- compareCodelists(
  codesFromDescendants,
  dementiaCodes1
)
codeComparison |>
  group_by(codelist) |>
  tally()
#> # A tibble: 2 × 2
#>   codelist            n
#>   <chr>           <int>
#> 1 Both              151
#> 2 Only codelist 2    36

What are these extra codes picked up by CodelistGenerator?

codeComparison |>
  filter(codelist == "Only codelist 2") |> 
  glimpse()
#> Rows: 36
#> Columns: 3
#> $ concept_id   <int> 4041685, 4043378, 4044415, 4046091, 4092747, 4187091, 425…
#> $ concept_name <chr> "Amyotrophic lateral sclerosis with dementia", "Frontotem…
#> $ codelist     <chr> "Only codelist 2", "Only codelist 2", "Only codelist 2", …

Review mappings from non-standard vocabularies

Perhaps we want to see what ICD10CM codes map to our candidate code list. We can get these by running

icdMappings <- getMappings(
  cdm = cdm,
  candidateCodelist = dementiaCodes1,
  nonStandardVocabularies = "ICD10CM"
)
icdMappings |> 
  glimpse()
#> Rows: 191
#> Columns: 7
#> $ standard_concept_id        <int> 372610, 374341, 374888, 374888, 374888, 374…
#> $ standard_concept_name      <chr> "Postconcussion syndrome", "Huntington's ch…
#> $ standard_vocabulary_id     <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SN…
#> $ non_standard_concept_id    <int> 45571706, 35207314, 1568088, 1568089, 37402…
#> $ non_standard_concept_name  <chr> "Postconcussional syndrome", "Huntington's …
#> $ non_standard_concept_code  <chr> "F07.81", "G10", "F02", "F02.8", "F02.811",…
#> $ non_standard_vocabulary_id <chr> "ICD10CM", "ICD10CM", "ICD10CM", "ICD10CM",…
readMappings <- getMappings(
  cdm = cdm,
  candidateCodelist = dementiaCodes1,
  nonStandardVocabularies = "Read"
)
readMappings |> 
  glimpse()
#> Rows: 93
#> Columns: 7
#> $ standard_concept_id        <int> 372610, 372610, 372610, 372610, 372610, 372…
#> $ standard_concept_name      <chr> "Postconcussion syndrome", "Postconcussion …
#> $ standard_vocabulary_id     <chr> "SNOMED", "SNOMED", "SNOMED", "SNOMED", "SN…
#> $ non_standard_concept_id    <int> 45446542, 45446553, 45453190, 45459905, 455…
#> $ non_standard_concept_name  <chr> "Post-concussion syndrome", "[X]Post-trauma…
#> $ non_standard_concept_code  <chr> "E2A2.00", "Eu06212", "E2A2.11", "E2A2.12",…
#> $ non_standard_vocabulary_id <chr> "READ", "READ", "READ", "READ", "READ", "RE…