About this item:

230 Views | 246 Downloads

Author Notes:

Correspondence and requests for materials should be addressed to T.Y. (email: tianwei.yu@emory.edu)

Author Contributions: T.Y. conceived the study. Y.K. programed the algorithm, conducted simulation and real data experiments. T.Y. and Y.K. analyzed the results. Y.K. and T.Y. drafted the manuscript. Both authors reviewed the manuscript.

Te authors thank Dr. Hao Wu for helpful discussions.

Te authors declare no competing interests.

Subject:

Research Funding:

Tis study was partially funded by NIH grant R01GM124061 and R37AI051231.

Keywords:

  • Science & Technology
  • Multidisciplinary Sciences
  • Science & Technology - Other Topics
  • SELECTION
  • MACHINE
  • REGRESSION

A Deep Neural Network Model using Random Forest to Extract Feature Representation for Gene Expression Data Classification

Tools:

Journal Title:

Scientific Reports

Volume:

Volume 8, Number 1

Publisher:

, Pages 16477-16477

Type of Work:

Article | Final Publisher PDF

Abstract:

In predictive model development, gene expression data is associated with the unique challenge that the number of samples (n) is much smaller than the amount of features (p). This “n ≪ p” property has prevented classification of gene expression data from deep learning techniques, which have been proved powerful under “n > p” scenarios in other application fields, such as image classification. Further, the sparsity of effective features with unknown correlation structures in gene expression profiles brings more challenges for classification tasks. To tackle these problems, we propose a newly developed classifier named Forest Deep Neural Network (fDNN), to integrate the deep neural network architecture with a supervised forest feature detector. Using this built-in feature detector, the method is able to learn sparse feature representations and feed the representations into a neural network to mitigate the overfitting problem. Simulation experiments and real data analyses using two RNA-seq expression datasets are conducted to evaluate fDNN’s capability. The method is demonstrated a useful addition to current predictive models with better classification performance and more meaningful selected features compared to ordinary random forests and deep neural networks.

Copyright information:

© 2018, The Author(s).

This is an Open Access work distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
Export to EndNote