About this item:

2,392 Views | 723 Downloads

Author Notes:

Email Address: Tianwei Yu :tianwei.yu@emory.edu

The authors declare that there is no conflict of interests regarding the publication of this paper.

Subjects:

Research Funding:

This work was partially supported by NIH Grants P20HL113451 and U19AI090023, 973 Program (no. 2013CB967101) of the Ministry of Science and Technology of China, and Shanghai Science Committee Foundation (13PJ1433200).

K -Profiles: A Nonlinear Clustering Method for Pattern Detection in High Dimensional Data

Tools:

Journal Title:

BioMed Research International

Volume:

Volume 2015

Publisher:

, Pages 1-10

Type of Work:

Article | Final Publisher PDF

Abstract:

With modern technologies such as microarray, deep sequencing, and liquid chromatography-mass spectrometry (LC-MS), it is possible to measure the expression levels of thousands of genes/proteins simultaneously to unravel important biological processes. A very first step towards elucidating hidden patterns and understanding the massive data is the application of clustering techniques. Nonlinear relations, which were mostly unutilized in contrast to linear correlations, are prevalent in high-throughput data. In many cases, nonlinear relations can model the biological relationship more precisely and reflect critical patterns in the biological systems. Using the general dependency measure, Distance Based on Conditional Ordered List (DCOL) that we introduced before, we designed the nonlinear K-profiles clustering method, which can be seen as the nonlinear counterpart of the K-means clustering algorithm. The method has a built-in statistical testing procedure that ensures genes not belonging to any cluster do not impact the estimation of cluster profiles. Results from extensive simulation studies showed that K-profiles clustering not only outperformed traditional linear K-means algorithm, but also presented significantly better performance over our previous General Dependency Hierarchical Clustering (GDHC) algorithm. We further analyzed a gene expression dataset, on which K-profile clustering generated biologically meaningful results.

Copyright information:

© 2015 Kai Wang et al.

This is an Open Access article distributed under the terms of the Creative Commons Attribution 3.0 Unported License ( http://creativecommons.org/licenses/by/3.0/), which permits making multiple copies, distribution of derivative works, distribution, public display, and publicly performance, provided the original work is properly cited. This license requires credit be given to copyright holder and/or author, copyright and license notices be kept intact.

Creative Commons License

Export to EndNote