The PhysioNet/Computing in Cardiology Challenge 2018 focused on the use of various physiological signals (EEG, EOG, EMG, ECG, SaO2) collected during polysomnographic sleep studies to detect sources of arousal (non-apnea) during sleep. A total of 1,983 polysomnographic recordings were made available to the entrants. The arousal labels for 994 of the recordings were made available in a public training set while 989 labels were retained in a hidden test set. Challengers were asked to develop an algorithm that could label the presence of arousals within the hidden test set. The performance metric used to assess entrants was the area under the precision-recall curve. A total of twenty-two independent teams entered the Challenge, deploying a variety of methods from generalized linear models to deep neural networks.
Objective: This study focuses on the comparison of single entropy measures for ventricular response analysis-based AF detection. Approach: To enhance the performance of entropy-based AF detectors, we developed a normalized fuzzy entropy, , a novel metric that (1) uses a fuzzy function to determine vector similarity, (2) replaces probability estimation with density estimation for entropy approximation, (3) utilizes a flexible distance threshold parameter, and (4) adjusts for heart rate by subtracting the natural log value of the mean RR interval. An AF detector based on was trained using the MIT-BIH atrial fibrillation (AF) database, and tested on the MIT-BIH normal sinus rhythm (NSR) and MIT-BIH arrhythmia databases. The -based AF detector was compared to AF detectors based on three other entropy measures: sample entropy (), fuzzy measure entropy () and coefficient of sample entropy (), over three standard window sizes. Main results: To classify AF and non-AF rhythms, achieved the highest area under receiver operating characteristic curve (AUC) values of 92.72%, 95.27% and 96.76% for 12-, 30- and 60-beat window lengths respectively. This was higher than the performance of the next best technique, , over all windows sizes, which provided respective AUCs of 91.12%, 91.86% and 90.55%. and resulted in lower AUCs (below 90%) over all window sizes. also provided superior performance for all other tested statistics, including the Youden index, sensitivity, specificity, accuracy, positive predictivity and negative predictivity. In conclusion, we show that can be used to accurately identify AF from RR interval time series. Furthermore, longer window lengths (up to one minute) increase the performance of all entropy-based AF detectors under evaluation except the -based method. Significance: Our results demonstrate that the new developed normalized fuzzy entropy is an accurate measure for detecting AF.
Objective: This study classifies sleep stages from a single lead electrocardiogram (ECG) using beat detection, cardiorespiratory coupling in the time-frequency domain and a deep convolutional neural network (CNN). Approach: An ECG-derived respiration (EDR) signal and synchronous beat-to-beat heart rate variability (HRV) time series were derived from the ECG using previously described robust algorithms. A measure of cardiorespiratory coupling (CRC) was extracted by calculating the coherence and cross-spectrogram of the EDR and HRV signal in 5 min windows. A CNN was then trained to classify the sleep stages (wake, rapid-eye-movement (REM) sleep, non-REM (NREM) light sleep and NREM deep sleep) from the corresponding CRC spectrograms. A support vector machine was then used to combine the output of CNN with the other features derived from the ECG, including phase-rectified signal averaging (PRSA), sample entropy, as well as standard spectral and temporal HRV measures. The MIT-BIH Polysomnographic Database (SLPDB), the PhysioNet/Computing in Cardiology Challenge 2018 database (CinC2018) and the Sleep Heart Health Study (SHHS) database, all expert-annotated for sleep stages, were used to train and validate the algorithm. Main results: Ten-fold cross validation results showed that the proposed algorithm achieved an accuracy (Acc) of 75.4% and a Cohen's kappa coefficient of = 0.54 on the out of sample validation data in the classification of Wake, REM, NREM light and deep sleep in SLPDB. This rose to Acc = 81.6% and = 0.63 for the classification of Wake, REM sleep and NREM sleep and Acc = 85.1% and = 0.68 for the classification of NREM sleep versus REM/wakefulness in SLPDB. Significance: The proposed ECG-based sleep stage classification approach that represents the highest reported results on non-electroencephalographic data and uses datasets over ten times larger than those in previous studies. By using a state-of-the-art QRS detector and deep learning model, the system does not require human annotation and can therefore be scaled for mass analysis.
Objective: Vast 12-lead ECGs repositories provide opportunities to develop new machine learning approaches for creating accurate and automatic diagnostic systems for cardiac abnormalities. However, most 12-lead ECG classification studies are trained, tested, or developed in single, small, or relatively homogeneous datasets. In addition, most algorithms focus on identifying small numbers of cardiac arrhythmias that do not represent the complexity and difficulty of ECG interpretation. This work addresses these issues by providing a standard, multi-institutional database and a novel scoring metric through a public competition: the PhysioNet/Computing in Cardiology Challenge 2020. Approach: A total of 66 361 12-lead ECG recordings were sourced from six hospital systems from four countries across three continents; 43 101 recordings were posted publicly with a focus on 27 diagnoses. For the first time in a public competition, we required teams to publish open-source code for both training and testing their algorithms, ensuring full scientific reproducibility. Main results: A total of 217 teams submitted 1395 algorithms during the Challenge, representing a diversity of approaches for identifying cardiac abnormalities from both academia and industry. As with previous Challenges, high-performing algorithms exhibited significant drops (10%) in performance on the hidden test data. Significance: Data from diverse institutions allowed us to assess algorithmic generalizability. A novel evaluation metric considered different misclassification errors for different cardiac abnormalities, capturing the outcomes and risks of different diagnoses. Requiring both trained models and code for training models improved the generalizability of submissions, setting a new bar in reproducibility for public data science competitions.
Objective. To develop a sleep staging method from wrist-worn accelerometry and the photoplethysmogram (PPG) by leveraging transfer learning from a large electrocardiogram (ECG) database. Approach. In previous work, we developed a deep convolutional neural network for sleep staging from ECG using the cross-spectrogram of ECG-derived respiration and instantaneous beat intervals, heart rate variability metrics, spectral characteristics, and signal quality measures derived from 5793 subjects in Sleep Heart Health Study (SHHS). We updated the weights of this model by transfer learning using PPG data derived from the Empatica E4 wristwatch worn by 105 subjects in the ‘Emory Twin Study Follow-up’ (ETSF) database, for whom overnight polysomnographic (PSG) scoring was available. The relative performance of PPG, and actigraphy (Act), plus combinations of these two signals, with and without transfer learning was assessed. Main results. The performance of our model with transfer learning showed higher accuracy (1–9 percentage points) and Cohen’s Kappa (0.01–0.13) than those without transfer learning for every classification category. Statistically significant, though relatively small, incremental differences in accuracy occurred for every classification category as tested with the McNemar test. The out-of-sample classification performance using features from PPG and actigraphy for four-class classification was Accuracy (Acc) = 68.62% and Kappa = 0.44. For two-class classification, the performance was Acc = 81.49% and Kappa = 0.58. Significance. We proposed a combined PPG and actigraphy-based sleep stage classification approach using transfer learning from a large ECG sleep database. Results demonstrate that the transfer learning approach improves estimates of sleep state. The use of automated beat detectors and quality metrics means human over-reading is not required, and the approach can be scaled for large cross-sectional or longitudinal studies using wrist-worn devices for sleep staging.
Objective and Approach: Sepsis, a dysregulated immune-mediated host response to infection, is the leading cause of morbidity and mortality in critically ill patients. Indices of heart rate variability and complexity (such as entropy) have been proposed as surrogate markers of neuro-immune system dysregulation with diseases such as sepsis. However, these indices only provide an average, one dimensional description of complex neuro-physiological interactions. We propose a novel multiscale network construction and analysis method for multivariate physiological time series, and demonstrate its utility for early prediction of sepsis. Main results: We show that features derived from a multiscale heart rate and blood pressure time series network provide approximately 20% improvement in the area under the receiver operating characteristic (AUROC) for four-hour advance prediction of sepsis over traditional indices of heart rate entropy ( versus ). Our results indicate that this improvement is attributable to both the improved network construction method proposed here, as well as the information embedded in the higher order interaction of heart rate and blood pressure time series dynamics. Our final model, which included the most commonly available clinical measurements in patients' electronic medical records and multiscale entropy features, as well as the proposed network-based features, achieved an AUROC of . Significance: Prediction of the onset of sepsis prior to clinical recognition will allow for meaningful earlier interventions (e.g. antibiotic and fluid administration), which have the potential to decrease sepsis-related morbidity, mortality and healthcare costs.
High false alarm rates in the ICU decrease quality of care by slowing staff response times while increasing patient delirium through noise pollution. The 2015 PhysioNet/Computing in Cardiology Challenge provides a set of 1250 multi-parameter ICU data segments associated with critical arrhythmia alarms, and challenges the general research community to address the issue of false alarm suppression using all available signals. Each data segment was 5 minutes long (for real time analysis), ending at the time of the alarm. For retrospective analysis, we provided a further 30 seconds of data after the alarm was triggered. A total of 750 data segments were made available for training and 500 were held back for testing. Each alarm was reviewed by expert annotators, at least two of whom agreed that the alarm was either true or false. Challenge participants were invited to submit a complete, working algorithm to distinguish true from false alarms, and received a score based on their program's performance on the hidden test set. This score was based on the percentage of alarms correct, but with a penalty that weights the suppression of true alarms five times more heavily than acceptance of false alarms. We provided three example entries based on well-known, open source signal processing algorithms, to serve as a basis for comparison and as a starting point for participants to develop their own code. A total of 38 teams submitted a total of 215 entries in this year's Challenge. This editorial reviews the background issues for this challenge, the design of the challenge itself, the key achievements, and the follow-up research generated as a result of the Challenge, published in the concurrent special issue of Physiological Measurement. Additionally we make some recommendations for future changes in the field of patient monitoring as a result of the Challenge.
Background: Atrial fibrillation (AFib) is the most common cardiac arrhythmia associated with stroke, blood clots, heart failure, coronary artery disease, and/or death. Multiple methods have been proposed for AFib detection, with varying performances, but no single approach appears to be optimal. We hypothesized that each state-of-the-art algorithm is appropriate for different subsets of patients and provides some independent information. Therefore, a set of suitably chosen algorithms, combined in a weighted voting framework, will provide a superior performance to any single algorithm. Methods: We investigate and modify 38 state-of-the-art AFib classification algorithms for a single-lead ambulatory electrocardiogram (ECG) monitoring device. All algorithms are ranked using a random forest classifier and an expert-labeled training dataset of 2,532 recordings. The seven top-ranked algorithms are combined by using an optimized weighting approach. Results: The proposed fusion algorithm, when validated on a separate test dataset consisting of 4,644 recordings, resulted in an area under the receiver operating characteristic (ROC) curve of 0.99. The sensitivity, specificity, positive-predictive-value (PPV), negative-predictive- value (NPV), and F1-score of the proposed algorithm were 0.93, 0.97, 0.87, 0.99, and 0.90, respectively, which were all superior to any single algorithm or any previously published. Conclusion: This study demonstrates how a set of well-chosen independent algorithms and a voting mechanism to fuse the outputs of the algorithms, outperforms any single state-of-the-art algorithm for AFib detection. The proposed framework is a case study for the general notion of crowdsourcing between open-source algorithms in healthcare applications. The extension of this framework to similar applications may significantly save time, effort, and resources, by combining readily existing algorithms. It is also a step toward the democratization of artificial intelligence and its application in healthcare.
Background Heart rate variability (HRV) metrics hold promise as potential indicators for autonomic function, prediction of adverse cardiovascular outcomes, psychophysiological status, and general wellness. Although the investigation of HRV has been prevalent for several decades, the methods used for preprocessing, windowing, and choosing appropriate parameters lack consensus among academic and clinical investigators. Methods A comprehensive and open-source modular program is presented for calculating HRV implemented in Matlab with evidence-based algorithms and output formats. We compare our software with another widely used HRV toolbox written in C and available through PhysioNet.org. Results Our findings show substantially similar results when using high quality electrocardiograms (ECG) free from arrhythmias. Conclusions Our software shows equivalent performance alongside an established predecessor and includes validated tools for performing preprocessing, signal quality, and arrhythmia detection to help provide standardization and repeatability in the field, leading to fewer errors in the presence of noise or arrhythmias.
Sepsis remains a leading cause of morbidity and mortality among intensive care unit (ICU) patients. For each hour treatment initiation is delayed after diagnosis, sepsis-related mortality increases by approximately 8%. Therefore, maximizing effective care requires early recognition and initiation of treatment protocols. Antecedent signs and symptoms of sepsis can be subtle and unrecognizable (e.g., loss of autonomic regulation of vital signs), causing treatment delays and harm to the patient. In this work we investigated the utility of high-resolution blood pressure (BP) and heart rate (HR) times series dynamics for the early prediction of sepsis in patients from an urban, academic hospital, meeting the third international consensus definition of sepsis (sepsis-III) during their ICU admission. Using a multivariate modeling approach we found that HR and BP dynamics at multiple time-scales are independent predictors of sepsis, even after adjusting for commonly measured clinical values and patient demographics and comorbidities. Earlier recognition and diagnosis of sepsis has the potential to decrease sepsis-related morbidity and mortality through earlier initiation of treatment protocols.