by
Kelly A. Shaw;
David Cutler;
David Okou;
Anne Dodd;
Bruce J. Aronow;
Yael Haberman;
Christine Stevens;
Thomas D. Walters;
Anne Griffiths;
Robert N. Baldassano;
Joshua D. Noe;
Jeffrey S. Hyams;
Wallace V. Crandall;
Barbara S. Kirschner;
Melvin B. Heyman;
Scott Snapper;
Stephen Guthery;
Marla C. Dubinsky;
Jason M. Shapiro;
Anthony R. Otley;
Mark Daly;
Lee A. Denson;
Subramaniam Kugathasan;
Michael Zwick
In the United States, approximately 5% of individuals with inflammatory bowel disease (IBD) are younger than 20 years old. Studies of pediatric cohorts can provide unique insights into genetic architecture of IBD, which includes Crohn’s disease (CD) and ulcerative colitis (UC). Large genome-wide association studies have found more than 200 IBD-associated loci but explain a minority of disease variance for CD and UC. We sought to characterize the contribution of rare variants to disease development, comparing exome sequencing of 368 pediatric IBD patients to publicly available exome sequencing (dbGaP) and aggregate frequency data (ExAC). Using dbGaP data, we performed logistic regression for common variants and optimal unified association tests (SKAT-O) for rare, likely-deleterious variants. We further compared rare variants to ExAC counts with Fisher’s exact tests. We did pathway enrichment analysis on the most significant genes from each comparison. Many variants overlapped with known IBD-associated genes (e.g. NOD2). Rare variants were enriched in CD-associated loci (p = 0.009) and showed suggestive enrichment in neutrophil function genes (p = 0.05). Pathway enrichment implicated immune-related pathways, especially cell killing and apoptosis. Variants in extracellular matrix genes also emerged as an important theme in our analysis.
Accurately selecting relevant alleles in large sequencing experiments remains technically challenging. Bystro (https://bystro.io/ ) is the first online, cloud-based application that makes variant annotation and filtering accessible to all researchers for terabyte-sized whole-genome experiments containing thousands of samples. Its key innovation is a general-purpose, natural-language search engine that enables users to identify and export alleles and samples of interest in milliseconds. The search engine dramatically simplifies complex filtering tasks that previously required programming experience or specialty command-line programs. Critically, Bystro's annotation and filtering capabilities are orders of magnitude faster than previous solutions, saving weeks of processing time for large experiments.
Determining the genetic architecture of liability for complex neuropsychiatric disorders like autism spectrum disorders and schizophrenia poses a tremendous challenge for contemporary biomedical research. Here we discuss how genetic studies first tested, and rejected, the hypothesis that common variants with large effects account for the prevalence of these disorders. We then explore how the discovery of structural variation has contributed to our understanding of the etiology of these disorders. The rise of fast and inexpensive oligonucleotide sequencing and methods of targeted enrichment and their influence on the search for rare genetic variation contributing to complex neuropsychiatric disorders is the next focus of our article. Finally, we consider the technical challenges and future prospects for the use of next-generation sequencing to reveal the complex genetic architecture of complex neuropsychiatric disorders in both research and the clinical settings.