About this item:

95 Views | 86 Downloads

Author Notes:

R Mitchell Parry: parry@bme.gatech.edu; John H Phan: jhpahn@gatech.edu; May D Wang: may.wang@emory.edu

RMP conceived of win percentage as a way to compare classifiers, designed the study, and drafted the document. JHP helped implement the classifiers, revise the document, and test significance.

MDW initiated the microarray quality control and high-throughput bio-molecular data mining investigation from which the idea for win percentage spawned, acquired funding to sponsor this effort, and directed the win percentage project and publication.

All authors read and approved the final manuscript.

The authors would like to thank Dr. Richard Moffitt and Dr. Todd Stokes for their insightful feedback and discussion about this work.


Research Funding:

This work was supported in part by grants from National Institutes of Health (Bioengineering Research Partnership R01CA108468, Center for Cancer Nanotechnology Excellence U54CA119338, 1RC2CA148265), and Georgia Cancer Coalition (Distinguished Cancer Scholar Award to Professor MD Wang), Microsoft Research and Hewlett Packard.


  • Science & Technology
  • Life Sciences & Biomedicine
  • Biochemical Research Methods
  • Biotechnology & Applied Microbiology
  • Mathematical & Computational Biology
  • Biochemistry & Molecular Biology

Win percentage: a novel measure for assessing the suitability of machine classifiers for biological problems


Proceedings Title:

BMC Bioinformatics

Conference Name:

ACM Conference on Bioinformatics, Computational Biology and Biomedicine (ACM-BCB)


Conference Place:

Chicago, IL


Volume 13 | Issue SUPPL.3

Publication Date:

Type of Work:

Conference | Final Publisher PDF


Background: Selecting an appropriate classifier for a particular biological application poses a difficult problem for researchers and practitioners alike. In particular, choosing a classifier depends heavily on the features selected. For high-throughput biomedical datasets, feature selection is often a preprocessing step that gives an unfair advantage to the classifiers built with the same modeling assumptions. In this paper, we seek classifiers that are suitable to a particular problem independent of feature selection. We propose a novel measure, called "win percentage", for assessing the suitability of machine classifiers to a particular problem. We define win percentage as the probability a classifier will perform better than its peers on a finite random sample of feature sets, giving each classifier equal opportunity to find suitable features.Results: First, we illustrate the difficulty in evaluating classifiers after feature selection. We show that several classifiers can each perform statistically significantly better than their peers given the right feature set among the top 0.001% of all feature sets. We illustrate the utility of win percentage using synthetic data, and evaluate six classifiers in analyzing eight microarray datasets representing three diseases: breast cancer, multiple myeloma, and neuroblastoma. After initially using all Gaussian gene-pairs, we show that precise estimates of win percentage (within 1%) can be achieved using a smaller random sample of all feature pairs. We show that for these data no single classifier can be considered the best without knowing the feature set. Instead, win percentage captures the non-zero probability that each classifier will outperform its peers based on an empirical estimate of performance.Conclusions: Fundamentally, we illustrate that the selection of the most suitable classifier (i.e., one that is more likely to perform better than its peers) not only depends on the dataset and application but also on the thoroughness of feature selection. In particular, win percentage provides a single measurement that could assist users in eliminating or selecting classifiers for their particular application.

Copyright information:

© Parry et al.; licensee BioMed Central Ltd. 2012

This is an Open Access work distributed under the terms of the Creative Commons Attribution 2.0 Generic License (http://creativecommons.org/licenses/by/2.0/).

Creative Commons License

Export to EndNote