About this item:

324 Views | 263 Downloads

Author Notes:

To whom correspondence may be addressed: Email: swarren@emory.edu, mzwick@ emory.edu, or djcutle@emory.edu.

See publication for full list of author contributions.

We thank the members of the laboratories of M.E.Z. and D.J.C. for comments on the manuscript, Cheryl T. Strauss for editing, and the Emory–Georgia Research Alliance Genome Center supported in part by Public Heath Service Grant UL1 RR025008 from the Clinical and Translational Science Award Program, the NIH, and the National Center for Research Resources for performing the Illumina sequencing runs.

The TARDIS Emory High Performance Computing Cluster was used for this project.

The authors declare no conflict of interest.

Subjects:

Research Funding:

This work was supported by NIH/National Institute of Mental Health Grants U54 HD082015 and U01 MH101720, which are part of the International Consortium on Brain and Behavior in 22q11.2 Deletion Syndrome, and the Simons Foundation Autism Research Initiative (M.E.Z.).

Keywords:

  • Science & Technology
  • Multidisciplinary Sciences
  • Science & Technology - Other Topics
  • genome sequencing
  • GATK
  • sequence mapping
  • SNP calling
  • software
  • 22Q11.2 DELETION SYNDROME
  • EXOME VARIANTS
  • GENERATION
  • SCHIZOPHRENIA
  • FRAMEWORK
  • ALIGNMENT
  • HUMANS
  • NUMBER
  • TOOLS
  • RISK

PEMapper and PECaller provide a simplified approach to whole-genome sequencing

Tools:

Journal Title:

Proceedings of the National Academy of Sciences

Volume:

Volume 114, Number 10

Publisher:

, Pages E1923-E1932

Type of Work:

Article | Final Publisher PDF

Abstract:

The analysis of human whole-genome sequencing data presents significant computational challenges. The sheer size of datasets places an enormous burden on computational, disk array, and network resources. Here, we present an integrated computational package, PEMapper/PECaller, that was designed specifically to minimize the burden on networks and disk arrays, create output files that are minimal in size, and run in a highly computationally efficient way, with the single goal of enabling whole-genome sequencing at scale. In addition to improved computational efficiency, we implement a statistical framework that allows for a base by base error model, allowing this package to perform as well or better than the widely used Genome Analysis Toolkit (GATK) in all key measures of performance on human whole-genome sequences.

Copyright information:

© 2017, National Academy of Sciences.

Export to EndNote