Publication

Staphylococcus aureus viewed from the perspective of 40,000+genomes

Downloadable Content

Persistent URL
Last modified
  • 05/15/2025
Type of Material
Authors
    Robert A. Petit, Emory UniversityTimothy D Read, Emory University
Language
  • English
Date
  • 2018-07-12
Publisher
  • PeerJ
Publication Version
Copyright Statement
  • © 2018 Petit and Read
License
Final Published Version (URL)
Title of Journal or Parent Work
ISSN
  • 2167-8359
Volume
  • 6
Start Page
  • e5261
End Page
  • e5261
Grant/Funding Information
  • The Seven Bridges NCI Cancer Genomics Cloud pilot was supported in part by the funds from the National Cancer Institute, National Institutes of Health, Department of Health and Human Services, under Contract No. HHSN261201400008C.
  • Funding was from Emory University, Amazon AWS in Education Grant Program, and NIH grants AI091827 and AI121860.
Supplemental Material (URL)
Abstract
  • Low-cost Illumina sequencing of clinically-important bacterial pathogens has generated thousands of publicly available genomic datasets. Analyzing these genomes and extracting relevant information for each pathogen and the associated clinical phenotypes requires not only resources and bioinformatic skills but organism-specific knowledge. In light of these issues, we created Staphopia, an analysis pipeline, database and application programming interface, focused on Staphylococcus aureus, a common colonizer of humans and a major antibiotic-resistant pathogen responsible for a wide spectrum of hospital and community-associated infections. Written in Python, Staphopia's analysis pipeline consists of submodules running open-source tools. It accepts raw FASTQ reads as an input, which undergo quality control filtration, error correction and reduction to a maximum of approximately 100× chromosome coverage. This reduction significantly reduces total runtime without detrimentally affecting the results. The pipeline performs de novo assembly-based and mapping-based analysis. Automated gene calling and annotation is performed on the assembled contigs. Read-mapping is used to call variants (single nucleotide polymorphisms and insertion/deletions) against a reference S. aureus chromosome (N315, ST5). We ran the analysis pipeline on more than 43,000 S. aureus shotgun Illumina genome projects in the public European Nucleotide Archive database in November 2017. We found that only a quarter of known multi-locus sequence types (STs) were represented but the top 10 STs made up 70% of all genomes. methicillin-resistant S. aureus (MRSA) were 64% of all genomes. Using the Staphopia database we selected 380 high quality genomes deposited with good metadata, each from a different multi-locus ST, as a non-redundant diversity set for studying S. aureus evolution. In addition to answering basic science questions, Staphopia could serve as a potential platform for rapid clinical diagnostics of S. aureus isolates in the future. The system could also be adapted as a template for other organism-specific databases.
Author Notes
Keywords
Research Categories
  • Health Sciences, Immunology
  • Health Sciences, Epidemiology

Tools

Relations

In Collection:

Items