About this item:

105 Views | 48 Downloads

Author Notes:

Evan A. Clayton, Email: eclayton3@gatech.edu

EAC and TAP performed the analysis and prepared the manuscript. JFM designed the project and reviewed the manuscript. PQ designed and supervised the project and reviewed the manuscript. The authors read and approved the final version of the manuscript.

Evan A. Clayton and Toyya A. Pujol are Co-first authors.

The authors declare that they have no competing interests.

Subjects:

Research Funding:

This work was supported by the National Institute of Health (T32, GM105490, CRP:10–2012-03), the National Science Foundation (CCF1552784), and the Giglio Family Breast Cancer Fund. PQ is an ISAC Marylou Ingram Scholar and a Carol Ann and David D. Flanagan Faculty Fellow. Publication costs are funded by PQ’s Faculty Fellowship. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Keywords:

  • Drug response
  • Machine learning
  • Personalized oncology
  • Predictive models
  • Antineoplastic Agents
  • Area Under Curve
  • Cluster Analysis
  • Databases, Genetic
  • Deoxycytidine
  • Fluorouracil
  • Gene Expression Regulation, Neoplastic
  • Humans
  • Machine Learning
  • Neoplasms
  • ROC Curve

Leveraging TCGA gene expression data to build predictive models for cancer drug response

Tools:

Journal Title:

BMC Bioinformatics

Volume:

Volume 21, Number Suppl 14

Publisher:

, Pages 364-364

Type of Work:

Article | Final Publisher PDF

Abstract:

Background: Machine learning has been utilized to predict cancer drug response from multi-omics data generated from sensitivities of cancer cell lines to different therapeutic compounds. Here, we build machine learning models using gene expression data from patients' primary tumor tissues to predict whether a patient will respond positively or negatively to two chemotherapeutics: 5-Fluorouracil and Gemcitabine. Results: We focused on 5-Fluorouracil and Gemcitabine because based on our exclusion criteria, they provide the largest numbers of patients within TCGA. Normalized gene expression data were clustered and used as the input features for the study. We used matching clinical trial data to ascertain the response of these patients via multiple classification methods. Multiple clustering and classification methods were compared for prediction accuracy of drug response. Clara and random forest were found to be the best clustering and classification methods, respectively. The results show our models predict with up to 86% accuracy; despite the study's limitation of sample size. We also found the genes most informative for predicting drug response were enriched in well-known cancer signaling pathways and highlighted their potential significance in chemotherapy prognosis. Conclusions: Primary tumor gene expression is a good predictor of cancer drug response. Investment in larger datasets containing both patient gene expression and drug response is needed to support future work of machine learning models. Ultimately, such predictive models may aid oncologists with making critical treatment decisions.

Copyright information:

© The Author(s) 2020

This is an Open Access work distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
Export to EndNote