Publication

PDEGEM: Modeling non-uniform read distribution in RNA-Seq data

Downloadable Content

Persistent URL
Last modified
  • 02/20/2025
Type of Material
Authors
    Yuchao Xia, Peking UniversityFugui Wang, Peking UniversityMinping Qian, Peking UniversityZhaohui Qin, Emory UniversityMinghua Deng, Peking University
Language
  • English
Date
  • 2015-01-01
Publisher
  • BioMed Central
Publication Version
Copyright Statement
  • © 2015 Xia et al.; licensee BioMed Central Ltd.
License
Final Published Version (URL)
Title of Journal or Parent Work
ISSN
  • 1755-8794
Volume
  • 8
Issue
  • 2
Start Page
  • S14
End Page
  • S14
Grant/Funding Information
  • Publication of this article has been funded by the National Natural Science Foundation of China (Nos. 31171262, 31428012,31471246), and the National Key Basic Research Project of China (No.2015CB910303).
Abstract
  • Background RNA-Seq is a powerful new technology to comprehensively analyze the transcriptome of any given cells. An important task in RNA-Seq data analysis is quantifying the expression levels of all transcripts. Although many methods have been introduced and much progress has been made, a satisfactory solution remains be elusive. Results In this article, we borrow the idea from the Positional Dependent Nearest Neighborhood (PDNN) model, originally developed for analyzing microarray data, to model the non-uniformity of read distribution in RNA-seq data. We propose a robust nonlinear regression model named PDEGEM, a Positional Dependent Energy Guided Expression Model to estimate the abundance of transcripts. Using real data, we find that the PDEGEM fits the data better than mseq in all three real datasets we tested. We also find that the expression measure obtained using PDEGEM showed higher correlation with that obtained from alterative assays for quantifying gene and isoform expressions. Conclusions Based on these results, we believe that our PDEGEM can improve the accuracy in modeling and estimating the transcript abundance and isoform expression in RNA-Seq data. Additionally, although the stacking energy and positional weight of the PDEGEM are relatively related to sequencing platforms and species, they share some common trends, which indicates that the PDEGEM could partly reflect the mechanism of DNA binding between the template strain and the new synthesized read. The PDEGEM model can be freely downloaded at: http://www.math.pku.edu.cn/teachers/dengmh/PDEGEM.
Author Notes
Keywords
Research Categories
  • Biology, Genetics
  • Biology, Biostatistics
  • Mathematics

Tools

Relations

In Collection:

Items