Historically {EGM} wrapped the Waveform Database
(WFDB) software package, a well documented suite of C
and C++ applications for interacting with signal data.
Recent releases of {EGM} now provide native readers,
writers, and annotation utilities implemented directly in R
and C++, so the core functionality of the package no longer
depends on an external WFDB installation. You can work with WFDB
headers, signals, and annotations on any system where {EGM}
is installed, including CRAN environments.
Because {EGM} ships its own WFDB readers and writers, no
additional system dependencies are required to begin analysing WFDB data
in R. Installing {EGM} from CRAN (or the
development version from GitHub) is sufficient for the examples in this
guide to run end-to-end on any platform.
If you would still like to interface with the traditional
C-based WFDB tools—for example to compare results or to
call PhysioNet utilities that have not yet been ported—you can follow
the instructions in the WFDB
GitHub repository. The C-based WFDB is more robust and
has a full set of tools and functionality not found in the
R version, however the key features of reading and writing
data and annotators is compatible with the R implementation
used by {EGM}.
WFDB files store signal data as integer values
(called digital units) in binary .dat
files. These integers represent the raw output from an analog-to-digital
converter (ADC). To convert these integer values to meaningful
physical units (like millivolts), the WFDB header file
(.hea) contains conversion metadata:
gain: ADC units per physical unit
(e.g., 200 ADC units per mV)baseline: The ADC value corresponding
to 0 physical unitsunits: The physical unit name (e.g.,
“mV”)The conversion formula between digital and physical units is:
\[ \text{physical} = \frac{\text{digital} - \text{baseline}}{\text{gain}} \]
And the inverse formula for writing:
\[ \text{digital} = (\text{physical} \times \text{gain}) + \text{baseline} \]
The read_wfdb() and read_signal() functions
support a units parameter that controls how signal data is
returned:
units = "digital" (default): Returns raw ADC counts as
stored in the fileunits = "physical": Returns values converted to
physical units (e.g., mV)Let’s demonstrate this concept using a WFDB file:
# Locate the WFDB files in the package
dir_path <- system.file('extdata', package = 'EGM')
# Read the same data in both digital and physical units
ecg_digital <- read_wfdb(
  record = "muse-sinus",
  record_dir = dir_path,
  units = "digital"
)
ecg_physical <- read_wfdb(
  record = "muse-sinus",
  record_dir = dir_path,
  units = "physical"
)
# Examine the header to see the conversion parameters
head(ecg_digital$header)
#> <header_table: 12 channels, 5000 samples @ 500 Hz> muse-sinus
#>         file_name storage_format number ADC_gain ADC_baseline ADC_units
#>            <char>          <int>  <int>    <num>        <int>    <char>
#> 1: muse-sinus.dat             16      1      200            0        mV
#> 2: muse-sinus.dat             16      2      200            0        mV
#> 3: muse-sinus.dat             16      3      200            0        mV
#> 4: muse-sinus.dat             16      4      200            0        mV
#> 5: muse-sinus.dat             16      5      200            0        mV
#> 6: muse-sinus.dat             16      6      200            0        mV
#>    ADC_zero ADC_resolution initial_value checksum blocksize  label   lead
#>       <int>          <int>         <int>    <int>     <int> <fctr> <char>
#> 1:        0             16           -10    -3015         0      I      I
#> 2:        0             16             5    -6357         0     II     II
#> 3:        0             16            15    -3468         0    III    III
#> 4:        0             16            10    -4851         0    AVF    AVF
#> 5:        0             16           -12      269         0    AVL    AVL
#> 6:        0             16             2     4498         0    AVR    AVR
#>    source additional_gain low_pass high_pass   color  scale
#>    <fctr>           <num>    <int>     <int>  <char> <lgcl>
#> 1:    ECG               1       NA        NA #000000     NA
#> 2:    ECG               1       NA        NA #000000     NA
#> 3:    ECG               1       NA        NA #000000     NA
#> 4:    ECG               1       NA        NA #000000     NA
#> 5:    ECG               1       NA        NA #000000     NA
#> 6:    ECG               1       NA        NA #000000     NA
#>   file_name storage_format ADC_gain ADC_baseline ADC_units ADC_resolution
#> 1  muse-sinus.dat      16      200            0        mV             16
#> 2  muse-sinus.dat      16      200            0        mV             16
#> 3  muse-sinus.dat      16      200            0        mV             16
#> 4  muse-sinus.dat      16      200            0        mV             16
#> 5  muse-sinus.dat      16      200            0        mV             16
#> 6  muse-sinus.dat      16      200            0        mV             16Notice the ADC_gain and ADC_baseline
columns in the header. In this example, all channels have a gain of 200
ADC units/mV and a baseline of 0. However, many real-world recordings
have non-zero baseline values to account for DC offsets in the recording
equipment.
Let’s compare the first few samples from lead II in both formats:
# Extract the first 10 samples from lead II
digital_values <- ecg_digital$signal$II[1:10]
physical_values <- ecg_physical$signal$II[1:10]
# Get the conversion parameters for lead II from the header
lead_ii_row <- ecg_digital$header[ecg_digital$header$label == "II", ]
gain <- lead_ii_row$ADC_gain
baseline <- lead_ii_row$ADC_baseline
# Create a comparison table
comparison <- data.frame(
  sample = 1:10,
  digital = digital_values,
  physical = round(physical_values, 3),
  calculated_physical = round((digital_values - baseline) / gain, 3)
)
print(comparison)
#>    sample digital physical calculated_physical
#> 1       1       5    0.025               0.025
#> 2       2       5    0.025               0.025
#> 3       3       5    0.025               0.025
#> 4       4      10    0.050               0.050
#> 5       5      10    0.050               0.050
#> 6       6      10    0.050               0.050
#> 7       7      10    0.050               0.050
#> 8       8      15    0.075               0.075
#> 9       9      20    0.100               0.100
#> 10     10      24    0.120               0.120
#>    Sample Digital Physical Calculated
#> 1       1       5    0.025      0.025
#> 2       2       5    0.025      0.025
#> 3       3       5    0.025      0.025
#> 4       4      10    0.050      0.050
#> 5       5      10    0.050      0.050
#> 6       6      10    0.050      0.050
#> 7       7      10    0.050      0.050
#> 8       8      15    0.075      0.075
#> 9       9      20    0.100      0.100
#> 10     10      24    0.120      0.120
# Verify the conversion formula
cat("\nLead II conversion parameters:\n")
#> 
#> Lead II conversion parameters:
cat("  Gain:", gain, "ADC units/mV\n")
#>   Gain: 200 ADC units/mV
cat("  Baseline:", baseline, "ADC units\n")
#>   Baseline: 0 ADC units
#> Lead II conversion parameters:
#>   Gain: 200 ADC units/mV
#>   Baseline: 0 ADC unitsThe calculated physical units are derived from the digital units using the ADC conversion, and thus maintains the fidelity of the original signal.
The WFDB software package utilizes a binary format to store
annotations. Annotations are essentially markers or qualifiers of
specific components of a signal, specifying both the specific time or
position in the plot, and what channel the annotation refers to.
Annotations are polymorphic, and multiple can be applied to a single
signal dataset. The credit for this work goes directly to the original
software creators, as this is just a wrapper to allow for flexible
integration into R. All of the demonstrations below rely on
the native {EGM} parsers, so they run identically whether
or not external WFDB binaries are present.
To begin, let’s take an example of a simple ECG dataset. This data is included in the package, and can be accessed as below.
fp <- system.file('extdata', 'muse-sinus.xml', package = 'EGM')
ecg <- read_muse(fp)
fig <- ggm(ecg) + theme_egm_light()
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
#> Scale for colour is already present.
#> Adding another scale for colour, which will replace the existing scale.
figWe may have customized software, manual approaches, or machine
learning models that may label signal data. We can use the
annotation_table() function to create
WFDB-compatible annotations, updating both the
egm object and the written file. To trial this, let’s label
the peaks of the QRS complex from this 12-lead ECG.
pracma::findpeaks() is quite good, but to
avoid dependencies we are writing our own.egm
object# Let x = 10-second signal dataset
# We will apply this across the dataset
# This is an oversimplified approach.
find_peaks <- function(x,
                       threshold = 
                         mean(x, na.rm = TRUE) + 2 * sd(x, na.rm = TRUE)
                       ) {
  
  # Ensure signal is "positive" for peak finding algorithm
  x <- abs(x)
  
  # Find the peaks
  peaks <- which(diff(sign(diff(x))) == -2) + 1
  
  # Filter the peaks
  peaks <- peaks[x[peaks] > threshold]
  
  # Return
  peaks
}
# Create a signal dataset
dat <- extract_signal(ecg)
# Find the peaks
sig <- dat[["I"]]
pk_loc <- find_peaks(sig)
pk_val <- sig[pk_loc]
pks <- data.frame(x = pk_loc, y = pk_val)
# Plot them
plot(sig, type = "l")
points(x = pks$x, y = pks$y, col = "orange")The result is not bad for a simple peak finder, but it lets us generate a small dataset of annotations that can then be used. Please see the additional vignettes for advanced annotation options, such as multichannel plots, and multichannel annotations. We can take a look under the hood at the annotation positions we generated. The relevant arguments, which are displayed below, include:
# Find the peaks
raw_signal <- dat[["I"]]
peak_positions <- find_peaks(raw_signal)
peak_positions
#>  [1]   96  427  759 1091 1753 2085 2417 2750 3080 3412 3744 4076 4409
# Annotations do not need to store the value at that time point however
# The annotation table function has the following arguments
args(annotation_table)
#> function (annotator = character(), time = character(), sample = integer(), 
#>     frequency = integer(), type = character(), subtype = character(), 
#>     channel = integer(), number = integer(), ...) 
#> NULL
# We can fill this in as below using additional data from the original ECG
hea <- ecg$header
start <- attributes(hea)$record_line$start_time
hz <- attributes(hea)$record_line$frequency
ann <- annotation_table(
  annotator = "our_pks",
  sample = peak_positions,
  type = "R",
  frequency = hz,
  channel = "I"
)
# Here are our annotations
ann
#> <annotation_table: 13 `our_pks` annotations>
#>             time sample   type subtype channel number
#>           <char>  <num> <char>  <char>  <char>  <int>
#>  1: 00:00:00.192     96      R               I      0
#>  2: 00:00:00.854    427      R               I      0
#>  3: 00:00:01.518    759      R               I      0
#>  4: 00:00:02.182   1091      R               I      0
#>  5: 00:00:03.506   1753      R               I      0
#>  6:  00:00:04.17   2085      R               I      0
#>  7: 00:00:04.834   2417      R               I      0
#>  8:   00:00:05.5   2750      R               I      0
#>  9:  00:00:06.16   3080      R               I      0
#> 10: 00:00:06.824   3412      R               I      0
#> 11: 00:00:07.488   3744      R               I      0
#> 12: 00:00:08.152   4076      R               I      0
#> 13: 00:00:08.818   4409      R               I      0
# Then, add this back to the original signal
ecg$annotation <- ann
ecg
#> <Electrogram>
#> -------------------
#> Recording Duration:  10 seconds
#> Recording frequency  500  hz
#> Number of channels:  12 
#> Channel Names:  I II III AVF AVL AVR V1 V2 V3 V4 V5 V6 
#> Annotation:  our_pks 
#> Type: Standard 12-lead ECG