Implementing a sepsis definition motivated development of the constellation package. There is ongoing debate in the medical literature about what constitutes sepsis, but our local team of clinical experts agreed to define sepsis as the occurrence of all 3 following criteria:
For each criteria, there is both a threshold to signify dysfunction as well as a relevant time window to search for events. A major challenge of working with medical data is that the sampling rate varies across measurements. For example, vital signs are sampled more frequently than blood labs, so their information is carried forward for a shorter period of time than lab results.
There are 3 steps to identify sepsis and each step corresponds to a constellation function:
Lastly, it may be required to separate distinct sepsis episodes during a single inpatient admission. For example, you may consider that sepsis events separated by more than 72 hours are distinct episodes that require additional evaluation and treatment. Thus, instances within 72 hours of an incident event are considered to be the same episode and instances separated by a minimum of 72 hours are considered to be distinct episodes. This logic is implemented via the incidents() function.
The .Rmd version of the vignette will not show code output. If you’d like to see code output, please do the following:
library(constellation)
vignette("identify_sepsis", package = "constellation")
First, load constellation and fasttime
library(constellation)
library(data.table)
Next, prep the timestamps
for (dt in list(vitals, labs, orders)) {
date_col <- grep("TIME", names(dt), value = TRUE)
set(dt, j = date_col, value = as.POSIXct(dt[[date_col]], format = "%Y-%m-%dT%H:%M:%SZ", tz = "UTC"))
}
Use this function to identify all instances of a drop in systolic blood pressure of greater than 40 within 6 hours.
systolic_bp <- vitals[VARIABLE == "SYSTOLIC_BP"]
systolic_bp_drop <- value_change(systolic_bp, value = 40, direction = "down",
window_hours = 6, join_key = "PAT_ID", time_var = "RECORDED_TIME",
value_var = "VALUE", mult = "all")
head(systolic_bp_drop)
## PAT_ID PRIOR_RECORDED_TIME PRIOR_VALUE CURRENT_RECORDED_TIME
## 1: 108546 2010-02-25 15:45:29 139.9967 2010-02-25 20:42:35
## 2: 108546 2010-03-01 15:57:24 136.8654 2010-03-01 16:07:00
## 3: 108546 2010-03-02 19:59:20 129.0167 2010-03-03 00:46:35
## 4: 108546 2010-03-02 20:49:00 110.1830 2010-03-03 00:46:35
## 5: 108546 2010-03-04 00:18:41 137.8095 2010-03-04 04:23:54
## 6: 108546 2010-03-04 02:13:39 130.3280 2010-03-04 04:23:54
## CURRENT_VALUE
## 1: 80.07446
## 2: 88.88972
## 3: 69.94551
## 4: 69.94551
## 5: 82.16874
## 6: 82.16874
First, break out all the SIRS component tables applying dysfunction thresholds
temp <- vitals[VARIABLE == "TEMPERATURE" & (VALUE > 100.4 | VALUE < 96.8)]
pulse <- vitals[VARIABLE == "PULSE" & VALUE > 90]
resp <- vitals[VARIABLE == "RESPIRATORY_RATE" & VALUE > 20]
wbc <- labs[VARIABLE == "WBC" & (VALUE > 12 | VALUE < 4)]
Next, identify which criteria are met at every measurement time
sirs <- constellate_criteria(temp, pulse, resp, wbc,
criteria_names = c("TEMPERATURE", "PULSE", "RESPIRATORY_RATE", "WBC"),
window_hours = c(6, 6, 6, 24), join_key = "PAT_ID",
time_var = "RECORDED_TIME")
head(sirs)
## PAT_ID RECORDED_TIME TEMPERATURE PULSE RESPIRATORY_RATE WBC
## 1: 108546 2010-02-25 05:46:27 0 1 0 0
## 2: 108546 2010-02-25 06:00:38 0 1 0 0
## 3: 108546 2010-02-25 06:37:13 0 1 1 0
## 4: 108546 2010-02-25 07:50:34 0 1 1 0
## 5: 108546 2010-02-25 08:19:12 0 1 1 0
## 6: 108546 2010-02-25 08:23:08 0 1 1 0
Sum values in the 4 columns to calculate SIRS score at every measurement time and subset to instances where SIRS >= 2
sirs[, SIRS_SCORE := TEMPERATURE + PULSE + RESPIRATORY_RATE + WBC]
sirs <- sirs[SIRS_SCORE >= 2]
head(sirs)
## PAT_ID RECORDED_TIME TEMPERATURE PULSE RESPIRATORY_RATE WBC
## 1: 108546 2010-02-25 06:37:13 0 1 1 0
## 2: 108546 2010-02-25 07:50:34 0 1 1 0
## 3: 108546 2010-02-25 08:19:12 0 1 1 0
## 4: 108546 2010-02-25 08:23:08 0 1 1 0
## 5: 108546 2010-02-25 08:30:46 0 1 1 0
## 6: 108546 2010-02-25 09:47:14 0 1 1 0
## SIRS_SCORE
## 1: 2
## 2: 2
## 3: 2
## 4: 2
## 5: 2
## 6: 2
First, build a table that combines all measurements for end organ damage:
## Subset values
end_organ <- labs[
(VARIABLE == "CREATININE" & VALUE > 2.0) |
(VARIABLE == "INR" & VALUE > 1.5) |
(VARIABLE == "BILIRUBIN" & VALUE > 2.0) |
(VARIABLE == "PLATELETS" & VALUE < 100) |
(VARIABLE == "LACTATE" & VALUE >= 2.0)
]
## normalize systolic_bp_drop
systolic_bp_drop <- systolic_bp_drop[,.(PAT_ID, CURRENT_RECORDED_TIME, CURRENT_VALUE)]
systolic_bp_drop[, VARIABLE := "SYSTOLIC_BP_DROP"]
setnames(systolic_bp_drop, c("CURRENT_RECORDED_TIME", "CURRENT_VALUE"), c("RECORDED_TIME", "VALUE"))
## subset low systolic_bp
systolic_bp <- systolic_bp[VALUE < 90]
## Combine SBP drop and < 90
end_organ <- rbind(end_organ, systolic_bp, systolic_bp_drop)
Normalize blood culture order data to feed into constellate() function. The timestamp variable in all time series data frames passed to constellate() must be identical.
setnames(orders, "ORDER_TIME", "RECORDED_TIME")
Calculate the first instant in which all 3 criteria are met for every patient.
## Find first sepsis events
sepsis <- constellate(sirs, orders, end_organ, window_hours = c(24, 24, 24),
join_key = "PAT_ID", time_var = "RECORDED_TIME", event_name = "SEPSIS",
mult = "first")
head(sepsis)
## PAT_ID SEPSIS_TIME
## 1: 108546 2010-02-25 22:49:06
## 2: 112374 2010-03-09 15:50:31
## 3: 244011 2010-12-05 14:05:25
## 4: 262980 2010-10-31 13:01:11
## 5: 296914 2010-06-16 07:28:30
## 6: 315107 2010-11-10 03:46:47
Calculate the last instant in which all 3 criteria are met for every patient.
## Find last sepsis events
sepsis <- constellate(sirs, orders, end_organ, window_hours = c(24, 24, 24),
join_key = "PAT_ID", time_var = "RECORDED_TIME", event_name = "SEPSIS",
mult = "last")
head(sepsis)
## PAT_ID SEPSIS_TIME
## 1: 108546 2010-06-03 16:26:05
## 2: 112374 2010-03-10 13:43:00
## 3: 244011 2010-12-09 04:47:59
## 4: 262980 2010-10-31 13:01:11
## 5: 296914 2010-06-16 13:57:53
## 6: 315107 2010-11-10 03:46:47
Calculate every instance in which all 3 criteria are met for every patient.
## Find all sepsis events
sepsis <- constellate(sirs, orders, end_organ, window_hours = c(24, 24, 24),
join_key = "PAT_ID", time_var = "RECORDED_TIME", event_name = "SEPSIS",
mult = "all")
head(sepsis)
## PAT_ID SEPSIS_TIME
## 1: 108546 2010-02-25 22:49:06
## 2: 108546 2010-02-25 23:11:45
## 3: 108546 2010-02-25 23:21:19
## 4: 108546 2010-02-25 23:46:15
## 5: 108546 2010-02-25 23:55:56
## 6: 108546 2010-02-26 00:13:26
Separate sepsis events that occur more than 72 hours apart for each patient.
## Find incident sepsis events for each patient
sepsis <- incidents(sepsis, window_hours = 72, time_var = "SEPSIS_TIME", join_key = "PAT_ID")
head(sepsis)
## PAT_ID SEPSIS_TIME
## 1: 108546 2010-02-25 22:49:06
## 2: 108546 2010-03-09 06:05:38
## 3: 108546 2010-03-14 14:01:59
## 4: 108546 2010-03-17 18:47:53
## 5: 108546 2010-03-22 17:15:46
## 6: 108546 2010-04-03 01:39:03
If you’d like to learn more about our sepsis model, please see our publications from the International Conerence on Machine Learning and Machine Learning in Health Care.