About this item:

33 Views | 36 Downloads

Author Notes:

Correspondence: Yi-Juan Hu, yijuan.hu@emory.edu

Author contributions: Y.J.H. conceived this work, developed methodologies, conducted numerical studies, and wrote the manuscript; Y.Y. developed methodologies and conducted numerical studies; T.D.R, V.F., and G.A.S. interpreted results and revised the manuscript.

Competing interests: The authors declare that they have no competing interests.

Subjects:

Research Funding:

This research was supported by the National Institutes of Health award R01GM141074 (Hu, Satten), and the Cancer Prevention and Research Institute of Texas (CPRIT) Rising Stars Award RR200056 (Fedirko).

Keywords:

  • microbiome
  • experimental bias
  • LOCOM
  • differential abundance
  • partial overlap
  • global test

Integrative analysis of microbial 16S gene and shotgun metagenomic sequencing data improves statistical efficiency

Tools:

Journal Title:

Research Square

Publisher:

, Pages 3376801-None

Type of Work:

Article | Preprint: Prior to Peer Review

Abstract:

Background: The most widely used technologies for profiling microbial communities are 16S marker-gene sequencing and shotgun metagenomic sequencing. Interestingly, many microbiome studies have performed both sequencing experiments on the same cohort of samples. The two sequencing datasets often reveal consistent patterns of microbial signatures, highlighting the potential for an integrative analysis to improve power of testing these signatures. However, differential experimental biases, partially overlapping samples, and differential library sizes pose tremendous challenges when combining the two datasets. Currently, researchers either discard one dataset entirely or use different datasets for different objectives. Methods: In this article, we introduce the first method of this kind, named Com-2seq, that combines the two sequencing datasets for testing differential abundance at the genus and community levels while overcoming these difficulties. The new method is based on our LOCOM model (Hu et al., 2022), which employs logistic regression for testing taxon differential abundance while remaining robust to experimental bias. To benchmark the performance of Com-2seq, we introduce two ad hoc approaches: applying LOCOM to pooled taxa count data and combining LOCOM p-values from analyzing each dataset separately. Results: Our simulation studies indicate that Com-2seq substantially improves statistical efficiency over analysis of either dataset alone and works better than the two ad hoc approaches. An application of Com-2seq to two real microbiome studies uncovered scientifically plausible findings that would have been missed by analyzing individual datasets. Conclusions: Com-2seq performs integrative analysis of 16S and metagenomic sequencing data, which improves statistical efficiency and has the potential to accelerate the search of microbial communities and taxa that are involved in human health and diseases.

Copyright information:

2023 NIH

This is an Open Access work distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
Export to EndNote