Publication

Gene representation in scRNA-seq is correlated with common motifs at the 3' end of transcripts.

Downloadable Content

Persistent URL
Last modified
  • 06/25/2025
Type of Material
Authors
    Xinling Li, Georgia Institute of Technology and Emory UniversityGreg Gibson, Georgia Institute of Technology, AtlantaPeng Qiu, Georgia Institute of Technology and Emory University
Language
  • English
Date
  • 2023
Publisher
  • Frointiers
Publication Version
Copyright Statement
  • © 2023 Li, Gibson and Qiu.
License
Final Published Version (URL)
Title of Journal or Parent Work
Volume
  • 3
Start Page
  • 1120290
End Page
  • 1120290
Supplemental Material (URL)
Abstract
  • One important characteristic of single-cell RNA sequencing (scRNA-seq) data is its high sparsity, where the gene-cell count data matrix contains high proportion of zeros. The sparsity has motivated widespread discussions on dropouts and missing data, as well as imputation algorithms of scRNA-seq analysis. Here, we aim to investigate whether there exist genes that are more prone to be under-detected in scRNA-seq, and if yes, what commonalities those genes may share. From public data sources, we gathered paired bulk RNA-seq and scRNA-seq data from 53 human samples, which were generated in diverse biological contexts. We derived pseudo-bulk gene expression by averaging the scRNA-seq data across cells. Comparisons of the paired bulk and pseudo-bulk gene expression profiles revealed that there indeed exists a collection of genes that are frequently under-detected in scRNA-seq compared to bulk RNA-seq. This result was robust to randomization when unpaired bulk and pseudo-bulk gene expression profiles were compared. We performed motif search to the last 350 bp of the identified genes, and observed an enrichment of poly(T) motif. The poly(T) motif toward the tails of those genes may be able to form hairpin structures with the poly(A) tails of their mRNA transcripts, making it difficult for their mRNA transcripts to be captured during scRNA-seq library preparation, which is a mechanistic conjecture of why certain genes may be more prone to be under-detected in scRNA-seq.
Author Notes
Keywords
Research Categories
  • Engineering, Biomedical

Tools

Relations

In Collection:

Items