Publication

Collective effects of long-range DNA methylations predict gene expressions and estimate phenotypes in cancer

Downloadable Content

Persistent URL
Last modified
  • 05/21/2025
Type of Material
Authors
    Soyeon Kim, University of PittsburghHyun Jung Park, University of PittsburghXiangqin Cui, Emory UniversityDegui Zhi, University of Texas Houston
Language
  • English
Date
  • 2020-03-03
Publisher
  • Nature Publishing Group
Publication Version
Copyright Statement
  • © 2020 Springer Nature Limited.
License
Final Published Version (URL)
Title of Journal or Parent Work
Volume
  • 10
Issue
  • 1
Start Page
  • 3920
End Page
  • 3920
Grant/Funding Information
  • This work was partially supported by US National Institute of Health: [T32 H.L. 129949 to S.K.]; and Cancer Prevention Research Institute of Texas [RP170668 to D.Z.].
Supplemental Material (URL)
Abstract
  • DNA methylation of various genomic regions has been found to be associated with gene expression in diverse biological contexts. However, most genome-wide studies have focused on the effect of (1) methylation in cis, not in trans and (2) a single CpG, not the collective effects of multiple CpGs, on gene expression. In this study, we developed a statistical machine learning model, geneEXPLORE (geneexpression prediction by long-range epigenetics), that quantifies the collective effects of both cis- and trans- methylations on gene expression. By applying geneEXPLORE to The Cancer Genome Atlas (TCGA) breast and 10 other types of cancer data, we found that most genes are associated with methylations of as much as 10 Mb from the promoters or more, and the long-range methylation explains 50% of the variation in gene expression on average, far greater than cis-methylation. geneEXPLORE outperforms competing methods such as BioMethyl and MethylXcan. Further, the predicted gene expressions could predict clinical phenotypes such as breast tumor status and estrogen receptor status (AUC = 0.999, 0.94 respectively) as accurately as the measured gene expression levels. These results suggest that geneEXPLORE provides a means for accurate imputation of gene expression, which can be further used to predict clinical phenotypes.
Author Notes
Keywords
Research Categories
  • Biology, Bioinformatics
  • Biology, Biostatistics
  • Biology, Genetics
  • Health Sciences, Oncology

Tools

Relations

In Collection:

Items