About this item:

86 Views | 86 Downloads

Author Notes:

See publication for full list of authors

Correspondence: Rebecca A. Gladstone, rg9@sanger.ac.uk, Stephen D. Bentley, sdb@sanger.ac.uk

Author contributions: Rebecca A. Gladstone: conceptualization, methodology, project administration, data curation, investigation, formal analysis, visualization, writing – original draft preparation.

Stephen D. Bentley: conceptualization, funding, project administration, writing – review and editing.

Stephanie W. Lo: methodology, project administration, data curation, formal analysis, visualization, writing – review and editing.

Richard Goater software: visualization, writing – review and editing.

Paulina A. Hawkins: resources, project administration, data curation.

Keith P. Klugman: conceptualization, funding. Anne von Gottberg: conceptualization, resources.

Robert F. Breiman: conceptualization.

See publication for full list of contributions.

Acknowledgements: We would like to thank all members of the GPS consortium for their collaborative spirit and determination during the monumental task of sampling, extracting and sequencing this dataset, and all contributions to experimental design and input into this manuscript.

We also would like to thank members of teams 284 and 81 at the Wellcome Sanger Institute (WSI) for their advice and critique and the pathogen informatics team at the WSI for the pipelines and expertise that made genomic analysis at this scale possible.

See publication for a full list of acknowledgements.

Conflicts of interest Dr Gladstone reports PhD studentship from Pfizer, outside the submitted work; Dr Lees reports grants from Pfizer, outside the submitted work; Dr Madhi reports grants from BMGF, during the conduct of the study; grants and personal fees from BMGF, grants from Pfizer, grants from GSK, grants from Sanofi, grants from BIOVAC, outside the submitted work; Dr Dagan reports grants and personal fees from Pfizer, during the conduct of the study; grants and personal fees from MSD, personal fees from MeMed, outside the submitted work; Dr von Gottberg reports grants and other from Pfizer, during the conduct of the study; grants and other from Sanofi, outside the submitted work; Dr Bentley reports personal fees from Pfizer, personal fees from Merck, outside the submitted work.

Ethical statement Isolates for this study were selected from retrospective bacterial collections in each country participating in GPS.

Appropriate approvals for use of isolates was obtained from each institution contributing isolates. No tissue material or other biological material was obtained from humans. All information regarding these isolates was anonymized.


Research Funding:

This study was co-funded by the Bill and Melinda Gates Foundation (grant code OPP1034556), the Wellcome Sanger Institute (core Wellcome grants 098051 and 206194) and the U.S. Centers for Disease Control and Prevention.

The funding sources had no role in isolate selection, analysis, or data interpretation. The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention.

See publication for full list of funding.


  • Science & Technology
  • Life Sciences & Biomedicine
  • Genetics & Heredity
  • Microbiology
  • Streptococcus pneumoniae
  • pneumococcal
  • whole genome sequencing
  • population structure
  • recombination
  • antibiotic resistance
  • pangenome
  • phylogenetic dating

Visualizing variation within Global Pneumococcal Sequence Clusters (GPSCs) and country population snapshots to contextualize pneumococcal isolates


Journal Title:

Microbial Genomics


Volume 6, Number 5


, Pages 1-13

Type of Work:

Article | Final Publisher PDF


Knowledge of pneumococcal lineages, their geographic distribution and antibiotic resistance patterns, can give insights into global pneumococcal disease. We provide interactive bioinformatic outputs to explore such topics, aiming to increase dissemi-nation of genomic insights to the wider community, without the need for specialist training. We prepared 12 country-specific phylogenetic snapshots, and international phylogenetic snapshots of 73 common Global Pneumococcal Sequence Clusters (GPSCs) previously defined using PopPUNK, and present them in Microreact. Gene presence and absence defined using Roary, and recombination profiles derived from Gubbins are presented in Phandango for each GPSC. Temporal phylogenetic signal was assessed for each GPSC using BactDating. We provide examples of how such resources can be used. In our example use of a country-specific phylogenetic snapshot we determined that serotype 14 was observed in nine unrelated genetic backgrounds in South Africa. The international phylogenetic snapshot of GPSC9, in which most serotype 14 isolates from South Africa were observed, highlights that there were three independent sub-clusters represented by South African serotype 14 isolates. We estimated from the GPSC9-dated tree that the sub-clusters were each established in South Africa during the 1980s. We show how recombination plots allowed the identification of a 20 kb recombination spanning the capsular polysaccharide locus within GPSC97. This was consistent with a switch from serotype 6A to 19A estimated to have occured in the 1990s from the GPSC97-dated tree. Plots of gene presence/absence of resistance genes (tet, erm, cat) across the GPSC23 phylogeny were consistent with acquisition of a composite transposon. We estimated from the GPSC23-dated tree that the acquisition occurred between 1953 and 1975. Finally, we demonstrate the assignment of GPSC31 to 17 externally generated pneumococcal serotype 1 assemblies from Utah via Pathogenwatch. Most of the Utah isolates clustered within GPSC31 in a USA-specific clade with the most recent common ancestor estimated between 1958 and 1981. The resources we have provided can be used to explore to data, test hypothesis and generate new hypotheses. The accessible assignment of GPSCs allows others to contextualize their own collections beyond the data presented here.

Copyright information:

This is an Open Access work distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
Export to EndNote