About this item:

63 Views | 41 Downloads

Author Notes:

Imon Banerjee, Email: imon.banerjee@emory.edu

J.S., I.B., A.K. and D.R. conceived the project and led the studies. A.K. provided the study data. J.S. and I.B. carried out the design of the machine learning models and performed the experiments. J.S., A.T. and I.B. wrote the manuscript, and J.S., A.T., A.K., D.R. and I.B. edited the manuscript.

This project was supported by a Grant from GE BlueSky (DR, IB).

The authors declare no competing interests.

Subjects:

Keywords:

  • Science & Technology
  • Multidisciplinary Sciences
  • Science & Technology - Other Topics
  • MEDICARE CLAIMS
  • DIAGNOSIS

Weakly supervised temporal model for prediction of breast cancer distant recurrence

Tools:

Journal Title:

SCIENTIFIC REPORTS

Volume:

Volume 11, Number 1

Publisher:

, Pages 9461-9461

Type of Work:

Article | Final Publisher PDF

Abstract:

Efficient prediction of cancer recurrence in advance may help to recruit high risk breast cancer patients for clinical trial on-time and can guide a proper treatment plan. Several machine learning approaches have been developed for recurrence prediction in previous studies, but most of them use only structured electronic health records and only a small training dataset, with limited success in clinical application. While free-text clinic notes may offer the greatest nuance and detail about a patient’s clinical status, they are largely excluded in previous predictive models due to the increase in processing complexity and need for a complex modeling framework. In this study, we developed a weak-supervision framework for breast cancer recurrence prediction in which we trained a deep learning model on a large sample of free-text clinic notes by utilizing a combination of manually curated labels and NLP-generated non-perfect recurrence labels. The model was trained jointly on manually curated data from 670 patients and NLP-curated data of 8062 patients. It was validated on manually annotated data from 224 patients with recurrence and achieved 0.94 AUROC. This weak supervision approach allowed us to learn from a larger dataset using imperfect labels and ultimately provided greater accuracy compared to a smaller hand-curated dataset, with less manual effort invested in curation.

Copyright information:

© The Author(s) 2021

This is an Open Access work distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/rdf).
Export to EndNote