by
Edilberto Amorim;
Wei-Long Zheng;
Mohammad M. Ghassemi;
Mahsa Aghaeeaval;
Pravinkumar Kandhare;
Vishnu Karukonda;
Jong Woo Lee;
Susan T. Herman;
Adithya Sivaraju;
Nicolas Gaspard;
Jeannette Hofmeijer;
Michel J. A. M. van Putten;
Reza Sameni;
Matthew A. Reyna;
Gari D. Clifford;
M. Brandon Westover
Objective:
To develop a harmonized multicenter clinical and electroencephalography (EEG) database for acute hypoxic-ischemic brain injury research involving patients with cardiac arrest.
Design:
Multicenter cohort, partly prospective and partly retrospective.
Setting:
Seven academic or teaching hospitals from the U.S. and Europe.
Patients:
Individuals aged 16 or older who were comatose after return of spontaneous circulation following a cardiac arrest who had continuous EEG monitoring were included.
Interventions:
not applicable.
Measurements and Main Results:
Clinical and EEG data were harmonized and stored in a common Waveform Database (WFDB)-compatible format. Automated spike frequency, background continuity, and artifact detection on EEG were calculated with 10 second resolution and summarized hourly. Neurological outcome was determined at 3–6 months using the best Cerebral Performance Category (CPC) scale. This database includes clinical and 56,676 hours (3.9 TB) of continuous EEG data for 1,020 patients. Most patients died (N=603, 59%), 48 (5%) had severe neurological disability (CPC 3 or 4), and 369 (36%) had good functional recovery (CPC 1–2). There is significant variability in mean EEG recording duration depending on the neurological outcome (range 53–102h for CPC 1 and CPC 4, respectively). Epileptiform activity averaging 1 Hz or more in frequency for at least one hour was seen in 258 (25%) patients (19% for CPC 1–2 and 29% for CPC 3–5). Burst suppression was observed for at least one hour in 207 (56%) and 635 (97%) patients with CPC 1–2 and CPC 3–5, respectively.
Conclusions:
The International Cardiac Arrest Research (I-CARE) consortium database provides a comprehensive real-world clinical and EEG dataset for neurophysiology research of comatose patients after cardiac arrest. This dataset covers the spectrum of abnormal EEG patterns after cardiac arrest, including epileptiform patterns and those in the ictal-interictal continuum.
Cardiac auscultation is an accessible diagnostic screening tool that can help to identify patients with heart murmurs, who may need follow-up diagnostic screening and treatment for abnormal cardiac function. However, experts are needed to interpret the heart sounds, limiting the accessibility of cardiac auscultation in resource-constrained environments. Therefore, the George B. Moody PhysioNet Challenge 2022 invited teams to develop algorithmic approaches for detecting heart murmurs and abnormal cardiac function from phonocardiogram (PCG) recordings of heart sounds. For the Challenge, we sourced 5272 PCG recordings from 1452 primarily pediatric patients in rural Brazil, and we invited teams to implement diagnostic screening algorithms for detecting heart murmurs and abnormal cardiac function from the recordings. We required the participants to submit the complete training and inference code for their algorithms, improving the transparency, reproducibility, and utility of their work. We also devised an evaluation metric that considered the costs of screening, diagnosis, misdiagnosis, and treatment, allowing us to investigate the benefits of algorithmic diagnostic screening and facilitate the development of more clinically relevant algorithms. We received 779 algorithms from 87 teams during the Challenge, resulting in 53 working codebases for detecting heart murmurs and abnormal cardiac function from PCG recordings. These algorithms represent a diversity of approaches from both academia and industry, including methods that use more traditional machine learning techniques with engineered clinical and statistical features as well as methods that rely primarily on deep learning models to discover informative features. The use of heart sound recordings for identifying heart murmurs and abnormal cardiac function allowed us to explore the potential of algorithmic approaches for providing more accessible diagnostic screening in resource-constrained environments. The submission of working, open-source algorithms and the use of novel evaluation metrics supported the reproducibility, generalizability, and clinical relevance of the research from the Challenge.
Objective:
The standard twelve-lead electrocardiogram (ECG) is a widely used tool for monitoring cardiac function and diagnosing cardiac disorders. The development of smaller, lower-cost, and easier-to-use ECG devices may improve access to cardiac care in lower-resource environments, but the diagnostic potential of these devices is unclear. This work explores these issues through a public competition: the 2021 PhysioNet Challenge. In addition, we explore the potential for performance boosting through a meta-learning approach.
Approach:
We sourced 131,149 twelve-lead ECG recordings from ten international sources. We posted 88,253 annotated recordings as public training data and withheld the remaining recordings as hidden validation and test data. We challenged teams to submit containerized, open-source algorithms for diagnosing cardiac abnormalities using various ECG lead combinations, including the code for training their algorithms. We designed and scored algorithms using an evaluation metric that captures the risks of different misdiagnoses for 30 conditions. After the Challenge, we implemented a semi-consensus voting model on all working algorithms.
Main results:
A total of 68 teams submitted 1,056 algorithms during the Challenge, providing a variety of automated approaches from both academia and industry. The performance differences across the different lead combinations were smaller than the performance differences across the different test databases, showing that generalizability posed a larger challenge to the algorithms than the choice of ECG leads. A voting model improved performance by 3.5%.
Significance:
The use of different ECG lead combinations allowed us to assess the diagnostic potential of reduced-lead ECG recordings, and the use of different data sources allowed us to assess the generalizability of algorithms to diverse institutions and populations. The submission of working, open-source code for both training and testing and the use of a novel evaluation metric improved the reproducibility, generalizability, and applicability of the research conducted during the Challenge.
The immune composition of the tumor microenvironment influences response and resistance to immunotherapies. While numerous studies have identified somatic correlates of immune infiltration, germline features that associate with immune infiltrates in cancers remain incompletely characterized. We analyze seven million autosomal germline variants in the TCGA cohort and test for association with established immune-related phenotypes that describe the tumor immune microenvironment. We identify one SNP associated with the amount of infiltrating follicular helper T cells; 23 candidate genes, some of which are involved in cytokine-mediated signaling and others containing cancer-risk SNPs; and networks with genes that are part of the DNA repair and transcription elongation pathways. In addition, we find a positive association between polygenic risk for rheumatoid arthritis and amount of infiltrating CD8+ T cells. Overall, we identify multiple germline genetic features associated with tumor-immune phenotypes and develop a framework for probing inherited features that contribute to differences in immune infiltration.
Background: It has been hypothesized that low access to healthy and nutritious food increases health disparities. Low-accessibility areas, called food deserts, are particularly commonplace in lower-income neighborhoods. The metrics for measuring the food environment’s health, called food desert indices, are primarily based on decadal census data, limiting their frequency and geographical resolution to that of the census. We aimed to create a food desert index with finer geographic resolution than census data and better responsiveness to environmental changes. Materials and methods: We augmented decadal census data with real-time data from platforms such as Yelp and Google Maps and crowd-sourced answers to questionnaires by the Amazon Mechanical Turks to create a real-time, context-aware, and geographically refined food desert index. Finally, we used this refined index in a concept application that suggests alternative routes with similar ETAs between a source and destination in the Atlanta metropolitan area as an intervention to expose a traveler to better food environments. Results: We made 139,000 pull requests to Yelp, analyzing 15,000 unique food retailers in the metro Atlanta area. In addition, we performed 248,000 walking and driving route analyses on these retailers using Google Maps’ API. As a result, we discovered that the metro Atlanta food environment creates a strong bias towards eating out rather than preparing a meal at home when access to vehicles is limited. Contrary to the food desert index that we started with, which changed values only at neighborhood boundaries, the food desert index that we built on top of it captured the changing exposure of a subject as they walked or drove through the city. This model was also sensitive to the changes in the environment that occurred after the census data was collected. Conclusions: Research on the environmental components of health disparities is flourishing. New machine learning models have the potential to augment various information sources and create fine-tuned models of the environment. This opens the way to better understanding the environment and its effects on health and suggesting better interventions.
Comprehensive sequencing of patient tumors reveals genomic mutations across tumor types that enable tumorigenesis and progression. A subset of oncogenic driver mutations results in neomorphic activity where the mutant protein mediates functions not engaged by the parental molecule. Here, we identify prevalent variant-enabled neomorph-protein-protein interactions (neoPPI) with a quantitative high-throughput differential screening (qHT-dS) platform. The coupling of highly sensitive BRET biosensors with miniaturized coexpression in an ultra-HTS format allows large-scale monitoring of the interactions of wild-type and mutant variant counterparts with a library of cancer-associated proteins in live cells. The screening of 17,792 interactions with 2,172,864 data points revealed a landscape of gain of interactions encompassing both oncogenic and tumor suppressor mutations. For example, the recurrent BRAF V600E lesion mediates KEAP1 neoPPI, rewiring a BRAFV600E/KEAP1 signaling axis and creating collateral vulnerability to NQO1 substrates, offering a combination therapeutic strategy. Thus, cancer genomic alterations can create neo-interactions, informing variant-directed therapeutic approaches for precision medicine.
Correction to: Nature Communications 10.1038/s41467-020-14367-0, published online 05 February 2020
In the published version of this paper, the members of the Pan-Cancer Analysis of Whole Genomes (PCAWG) Consortium were listed in the Supplementary Information; however, these members should have been included in the main paper. The original Article has been corrected to include the members and affiliations of the PCAWG Consortium in the main paper; the corrections have been made to the HTML version of the Article but not the PDF version. In the PCAWG Drivers and Functional Interpretation Group, the affiliation for Erik Larsson has also been changed from ‘Computational Biology Center, Memorial Sloan Kettering Cancer Center, New York, NY 10065, USA’ to ‘Institute of Biomedicine, Sahlgrenska Academy at University of Gothenburg, Gothenburg, Sweden’. Additional corrections to affiliations have been made to the PDF and HTML versions of the original Article for consistency of information between the PCAWG list and the main paper.
OBJECTIVES:
Sepsis is a major public health concern with significant morbidity, mortality, and healthcare expenses. Early detection and antibiotic treatment of sepsis improve outcomes. However, although professional critical care societies have proposed new clinical criteria that aid sepsis recognition, the fundamental need for early detection and treatment remains unmet. In response, researchers have proposed algorithms for early sepsis detection, but directly comparing such methods has not been possible because of different patient cohorts, clinical variables and sepsis criteria, prediction tasks, evaluation metrics, and other differences. To address these issues, the PhysioNet/Computing in Cardiology Challenge 2019 facilitated the development of automated, open-source algorithms for the early detection of sepsis from clinical data.
DESIGN:
Participants submitted containerized algorithms to a cloud-based testing environment, where we graded entries for their binary classification performance using a novel clinical utility-based evaluation metric. We designed this scoring function specifically for the Challenge to reward algorithms for early predictions and penalize them for late or missed predictions and for false alarms.
SETTING:
ICUs in three separate hospital systems. We shared data from two systems publicly and sequestered data from all three systems for scoring. PATIENTS: We sourced over 60,000 ICU patients with up to 40 clinical variables for each hour of a patient's ICU stay. We applied Sepsis-3 clinical criteria for sepsis onset.
INTERVENTIONS:
None.
MEASUREMENTS AND MAIN RESULTS:
A total of 104 groups from academia and industry participated, contributing 853 submissions. Furthermore, 90 abstracts based on Challenge entries were accepted for presentation at Computing in Cardiology.
CONCLUSIONS:
Diverse computational approaches predict the onset of sepsis several hours before clinical recognition, but generalizability to different hospital systems remains a challenge.
Objective: Vast 12-lead ECGs repositories provide opportunities to develop new machine learning approaches for creating accurate and automatic diagnostic systems for cardiac abnormalities. However, most 12-lead ECG classification studies are trained, tested, or developed in single, small, or relatively homogeneous datasets. In addition, most algorithms focus on identifying small numbers of cardiac arrhythmias that do not represent the complexity and difficulty of ECG interpretation. This work addresses these issues by providing a standard, multi-institutional database and a novel scoring metric through a public competition: the PhysioNet/Computing in Cardiology Challenge 2020. Approach: A total of 66 361 12-lead ECG recordings were sourced from six hospital systems from four countries across three continents; 43 101 recordings were posted publicly with a focus on 27 diagnoses. For the first time in a public competition, we required teams to publish open-source code for both training and testing their algorithms, ensuring full scientific reproducibility. Main results: A total of 217 teams submitted 1395 algorithms during the Challenge, representing a diversity of approaches for identifying cardiac abnormalities from both academia and industry. As with previous Challenges, high-performing algorithms exhibited significant drops (10%) in performance on the hidden test data. Significance: Data from diverse institutions allowed us to assess algorithmic generalizability. A novel evaluation metric considered different misclassification errors for different cardiac abnormalities, capturing the outcomes and risks of different diagnoses. Requiring both trained models and code for training models improved the generalizability of submissions, setting a new bar in reproducibility for public data science competitions.
Background:
Acute respiratory failure occurs frequently in hospitalized patients and often begins outside the ICU, associated with increased length of stay, cost, and mortality. Delays in decompensation recognition are associated with worse outcomes.
Objectives:
The objective of this study is to predict acute respiratory failure requiring any advanced respiratory support (including noninvasive ventilation). With the advent of the coronavirus disease pandemic, concern regarding acute respiratory failure has increased.
Derivation Cohort:
All admission encounters from January 2014 to June 2017 from three hospitals in the Emory Healthcare network (82,699).
Validation Cohort:
External validation cohort: all admission encounters from January 2014 to June 2017 from a fourth hospital in the Emory Healthcare network (40,143). Temporal validation cohort: all admission encounters from February to April 2020 from four hospitals in the Emory Healthcare network coronavirus disease tested (2,564) and coronavirus disease positive (389).
Prediction Model:
All admission encounters had vital signs, laboratory, and demographic data extracted. Exclusion criteria included invasive mechanical ventilation started within the operating room or advanced respiratory support within the first 8 hours of admission. Encounters were discretized into hour intervals from 8 hours after admission to discharge or advanced respiratory support initiation and binary labeled for advanced respiratory support. Prediction of Acute Respiratory Failure requiring advanced respiratory support in Advance of Interventions and Treatment, our eXtreme Gradient Boosting-based algorithm, was compared against Modified Early Warning Score.
Results:
Prediction of Acute Respiratory Failure requiring advanced respiratory support in Advance of Interventions and Treatment had significantly better discrimination than Modified Early Warning Score (area under the receiver operating characteristic curve 0.85 vs 0.57 [test], 0.84 vs 0.61 [external validation]). Prediction of Acute Respiratory Failure requiring advanced respiratory support in Advance of Interventions and Treatment maintained a positive predictive value (0.31–0.21) similar to that of Modified Early Warning Score greater than 4 (0.29–0.25) while identifying 6.62 (validation) to 9.58 (test) times more true positives. Furthermore, Prediction of Acute Respiratory Failure requiring advanced respiratory support in Advance of Interventions and Treatment performed more effectively in temporal validation (area under the receiver operating characteristic curve 0.86 [coronavirus disease tested], 0.93 [coronavirus disease positive]), while achieving identifying 4.25–4.51× more true positives.
Conclusions:
Prediction of Acute Respiratory Failure requiring advanced respiratory support in Advance of Interventions and Treatment is more effective than Modified Early Warning Score in predicting respiratory failure requiring advanced respiratory support at external validation and in coronavirus disease 2019 patients. Silent prospective validation necessary before local deployment.