by
Courtney Astore;
Shivam Sharma;
Sini Nagpal;
David J Cutler;
John D. Rioux;
Judy H. Cho;
Dermot P.B. McGovern;
Steven R. Brant;
Subra Kugathasan;
I. King Jordan;
Greg Gibson
Background
Identification of rare variants involved in complex, polygenic diseases like Crohn’s disease (CD) has accelerated with the introduction of whole exome/genome sequencing association studies. Rare variants can be used in both diagnostic and therapeutic assessments; however, since they are likely to be restricted to specific ancestry groups, their contributions to risk assessment need to be evaluated outside the discovery population. Prior studies implied that the three known rare variants in NOD2 are absent in West African and Asian populations and only contribute in African Americans via admixture.
Methods
Whole genome sequencing (WGS) data from 3418 African American individuals, 1774 inflammatory bowel disease (IBD) cases, and 1644 controls were used to assess odds ratios and allele frequencies (AF), as well as haplotype-specific ancestral origins of European-derived CD variants discovered in a large exome-wide association study. Local and global ancestry was performed to assess the contribution of admixture to IBD contrasting European and African American cohorts.
Results
Twenty-five rare variants associated with CD in European discovery cohorts are typically five-fold lower frequency in African Americans. Correspondingly, where comparisons could be made, the rare variants were found to have a predicted four-fold reduced burden for IBD in African Americans, when compared to European individuals. Almost all of the rare CD European variants were found on European haplotypes in the African American cohort, implying that they contribute to disease risk in African Americans primarily due to recent admixture. In addition, proportion of European ancestry correlates the number of rare CD European variants each African American individual carry, as well as their polygenic risk of disease. Similar findings were observed for 23 mutations affecting 10 other common complex diseases for which the rare variants were discovered in European cohorts.
Conclusions
European-derived Crohn’s disease rare variants are even more rare in African Americans and contribute to disease risk mainly due to admixture, which needs to be accounted for when performing cross-ancestry genetic assessments.
Genome-wide association studies of complex traits frequently find that SNP-based estimates of heritability are considerably smaller than estimates from classic family-based studies. This ‘missing’ heritability may be partly explained by genetic variants interacting with other genes or environments that are difficult to specify, observe, and detect. To circumvent these challenges, we propose a new method to detect genetic interactions that leverages pleiotropy from multiple related traits without requiring the interacting variable to be specified or observed. Our approach, Latent Interaction Testing (LIT), uses the observation that correlated traits with shared latent genetic interactions have trait variance and covariance patterns that differ by genotype. LIT examines the relationship between trait variance/covariance patterns and genotype using a flexible kernel-based framework that is computationally scalable for biobank-sized datasets with a large number of traits. We first use simulated data to demonstrate that LIT substantially increases power to detect latent genetic interactions compared to a trait-by-trait univariate method. We then apply LIT to four obesity-related traits in the UK Biobank and detect genetic variants with interactive effects near known obesity-related genes. Overall, we show that LIT, implemented in the R package lit, uses shared information across traits to improve detection of latent genetic interactions compared to standard approaches.
Most complex human traits differ by sex, but we have limited insight into the underlying mechanisms. Here, we investigated the influence of biological sex on protein expression and its genetic regulation in 1,277 human brain proteomes. We found that 13.2% (1,354) of brain proteins had sex-differentiated abundance and 1.5% (150) of proteins had sex-biased protein quantitative trait loci (sb-pQTLs). Among genes with sex-biased expression, we found 67% concordance between sex-differentiated protein and transcript levels; however, sex effects on the genetic regulation of expression were more evident at the protein level. Considering 24 psychiatric, neurologic and brain morphologic traits, we found that an average of 25% of their putatively causal genes had sex-differentiated protein abundance and 12 putatively causal proteins had sb-pQTLs. Furthermore, integrating sex-specific pQTLs with sex-stratified genome-wide association studies of six psychiatric and neurologic conditions, we uncovered another 23 proteins contributing to these traits in one sex but not the other. Together, these findings begin to provide insights into mechanisms underlying sex differences in brain protein expression and disease.
In this the first of an anticipated four paper series, fundamental results of quantitative genetics are presented from a first principles approach. While none of these results are in any sense new, they are presented in extended detail to precisely distinguish between definition and assumption, with a further emphasis on distinguishing quantities from their usual approximations. Terminology frequently encountered in the field of human genetic disease studies will be defined in terms of their quantitive genetics form. Methods for estimation of both quantitative genetics and the related human genetics quantities will be demonstrated. While practitioners in the field of human quantitative disease studies may find this work pedantic in detail, the principle target audience for this work is trainees reasonably familiar with population genetics theory, but with less experience in its application to human disease studies. We introduce much of this formalism because in later papers in this series, we demonstrate that common areas of confusion in human disease studies can be resolved be appealing directly to these formal definitions. The second paper in this series will discuss polygenic risk scores. The third paper will concern the question of “missing” heritability and the role interactions may play. The fourth paper will discuss sexually dimorphic disease and the potential role of the X chromosome.
by
Kelsey Robinson;
Trenell J. Mosley;
Kenneth S. Rivera-González;
Christopher R. Jabbarpour;
Sarah Curtis;
Wasiu Lanre Adeyemo;
Terri H. Beaty;
Azeez Butali;
Carmen J. Buxó;
David J Cutler;
Michael Epstein;
Lord J.J. Gowans;
Jacqueline T. Hecht;
Jeffrey C. Murray;
Gary M. Shaw;
Lina Moreno Uribe;
Seth M. Weinberg;
Harrison Brand;
Mary L. Marazita;
Robert J. Lipinski;
Elizabeth Leslie
Cleft palate (CP) is one of the most common craniofacial birth defects; however, there are relatively few established genetic risk factors associated with its occurrence despite high heritability. Historically, CP has been studied as a single phenotype, although it manifests across a spectrum of defects involving the hard and/or soft palate. We performed a genome-wide association study using transmission disequilibrium tests of 435 case-parent trios to evaluate broad risks for any cleft palate (ACP) (n = 435), and subtype-specific risks for any cleft soft palate (CSP), (n = 259) and any cleft hard palate (CHP) (n = 125). We identified a single genome-wide significant locus at 9q33.3 (lead SNP rs7035976, p = 4.24 × 10−8) associated with CHP. One gene at this locus, angiopoietin-like 2 (ANGPTL2), plays a role in osteoblast differentiation. It is expressed both in craniofacial tissue of human embryos and developing mouse palatal shelves. We found 19 additional loci reaching suggestive significance (p < 5 × 10−6), of which only one overlapped between groups (chromosome 17q24.2, ACP and CSP). Odds ratios for the 20 loci were most similar across all 3 groups for SNPs associated with the ACP group, but more distinct when comparing SNPs associated with either subtype. We also found nominal evidence of replication (p < 0.05) for 22 SNPs previously associated with orofacial clefts. Our study to evaluate CP risks in the context of its subtypes and we provide newly reported associations affecting the broad risk for CP as well as evidence of subtype-specific risks.
by
Kelsey Robinson;
Trenell J Mosley;
Kenneth S Rivera-Gonzalez;
Christopher R Jabbarpour;
Ssarh W Curtis;
Wasiu Lanre Adeyemo;
Terri H Beaty;
Azeez Butali;
Carmen J Buxó;
David Cutler;
Michael Epstein;
Lord JJ Gowans;
Jacqueline T Hect;
Jeffrey C Murray;
Gary M Shaw;
Lina Moreno Uribe;
Seth M Weinberg;
Harrison Brand;
Mary L Marazita;
Robert J Lipinski;
Elizabeth Leslie
Orofacial clefts (OFCs) are the most common craniofacial birth defects and are often categorized into two etiologically distinct groups: cleft lip with or without cleft palate (CL/P) and isolated cleft palate (CP). CP is highly heritable, but there are still relatively few established genetic risk factors associated with its occurrence compared to CL/P. Historically, CP has been studied as a single phenotype despite manifesting across a spectrum of defects involving the hard and/or soft palate. We performed GWAS using transmission disequilibrium tests using 435 case-parent trios to evaluate broad risks for any cleft palate (ACP, n=435), as well as subtype-specific risks for any cleft soft palate (CSP, n=259) and any cleft hard palate (CHP, n=125). We identified a single genome-wide significant locus at 9q33.3 (lead SNP rs7035976, p=4.24x10−8) associated with CHP. One gene at this locus, angiopoietin-like 2 (ANGPTL2), plays a role in osteoblast differentiation. It is expressed in craniofacial tissue of human embryos, as well as in the developing mouse palatal shelves. We found 19 additional loci reaching suggestive significance (p<5x10−6), of which only one overlapped between groups (chromosome 17q24.2, ACP and CSP). Odds ratios (ORs) for each of the 20 loci were most similar across all three groups for SNPs associated with the ACP group, but more distinct when comparing SNPs associated with either the CSP or CHP groups. We also found nominal evidence of replication (p<0.05) for 22 SNPs previously associated with cleft palate (including CL/P). Interestingly, most SNPs associated with CL/P cases were found to convey the opposite effect in those replicated in our dataset for CP only. Ours is the first study to evaluate CP risks in the context of its subtypes and we provide newly reported associations affecting the broad risk for CP as well as evidence of subtype-specific risks.
by
Steven R. Brant;
David T. Okou;
Claire L. Simpson;
David J Cutler;
Talin Haritunians;
Jonathan P. Bradfield;
Pankaj Chopra;
Jarod Prince;
Ferdouse Begum;
Archana Kumar;
Chengrui Huang;
Suresh Venkateswaran;
Lisa W. Datta;
Zhi Wei;
Kelly Thomas;
Lisa J. Herrinton;
Jan-Micheal A. Klapproth;
Antonio J. Quiros;
Jenifer Seminerio;
Zhenqiu Liu;
Jonathan S. Alexander;
Robert N. Baldassano;
Sharon Dudley-Brown;
Raymond K. Cross;
Themistocles Dassopoulos;
Lee A. Denson;
Tanvi Dhere;
Gerald W. Dryden;
John S. Hanson;
Michael Zwick;
Subra Kugathasan
Background & Aims The inflammatory bowel diseases (IBD) ulcerative colitis (UC) and Crohn's disease (CD) cause significant morbidity and are increasing in prevalence among all populations, including African Americans. More than 200 susceptibility loci have been identified in populations of predominantly European ancestry, but few loci have been associated with IBD in other ethnicities. Methods We performed 2 high-density, genome-wide scans comprising 2345 cases of African Americans with IBD (1646 with CD, 583 with UC, and 116 inflammatory bowel disease unclassified) and 5002 individuals without IBD (controls, identified from the Health Retirement Study and Kaiser Permanente database). Single-nucleotide polymorphisms (SNPs) associated at P < 5.0 × 10 −8 in meta-analysis with a nominal evidence (P < .05) in each scan were considered to have genome-wide significance. Results We detected SNPs at HLA-DRB1, and African-specific SNPs at ZNF649 and LSAMP, with associations of genome-wide significance for UC. We detected SNPs at USP25 with associations of genome-wide significance for IBD. No associations of genome-wide significance were detected for CD. In addition, 9 genes previously associated with IBD contained SNPs with significant evidence for replication (P < 1.6 × 10 −6 ): ADCY3, CXCR6, HLA-DRB1 to HLA-DQA1 (genome-wide significance on conditioning), IL12B, PTGER4, and TNC for IBD; IL23R, PTGER4, and SNX20 (in strong linkage disequilibrium with NOD2) for CD; and KCNQ2 (near TNFRSF6B) for UC. Several of these genes, such as TNC (near TNFSF15), CXCR6, and genes associated with IBD at the HLA locus, contained SNPs with unique association patterns with African-specific alleles. Conclusions We performed a genome-wide association study of African Americans with IBD and identified loci associated with UC in only this population; we also replicated IBD, CD, and UC loci identified in European populations. The detection of variants associated with IBD risk in only people of African descent demonstrates the importance of studying the genetics of IBD and other complex diseases in populations beyond those of European ancestry.
by
Kelly A. Shaw;
David Cutler;
David Okou;
Anne Dodd;
Bruce J. Aronow;
Yael Haberman;
Christine Stevens;
Thomas D. Walters;
Anne Griffiths;
Robert N. Baldassano;
Joshua D. Noe;
Jeffrey S. Hyams;
Wallace V. Crandall;
Barbara S. Kirschner;
Melvin B. Heyman;
Scott Snapper;
Stephen Guthery;
Marla C. Dubinsky;
Jason M. Shapiro;
Anthony R. Otley;
Mark Daly;
Lee A. Denson;
Subramaniam Kugathasan;
Michael Zwick
In the United States, approximately 5% of individuals with inflammatory bowel disease (IBD) are younger than 20 years old. Studies of pediatric cohorts can provide unique insights into genetic architecture of IBD, which includes Crohn’s disease (CD) and ulcerative colitis (UC). Large genome-wide association studies have found more than 200 IBD-associated loci but explain a minority of disease variance for CD and UC. We sought to characterize the contribution of rare variants to disease development, comparing exome sequencing of 368 pediatric IBD patients to publicly available exome sequencing (dbGaP) and aggregate frequency data (ExAC). Using dbGaP data, we performed logistic regression for common variants and optimal unified association tests (SKAT-O) for rare, likely-deleterious variants. We further compared rare variants to ExAC counts with Fisher’s exact tests. We did pathway enrichment analysis on the most significant genes from each comparison. Many variants overlapped with known IBD-associated genes (e.g. NOD2). Rare variants were enriched in CD-associated loci (p = 0.009) and showed suggestive enrichment in neutrophil function genes (p = 0.05). Pathway enrichment implicated immune-related pathways, especially cell killing and apoptosis. Variants in extracellular matrix genes also emerged as an important theme in our analysis.
by
Adrian Gherman;
Peter E. Chen;
Tanya M. Teslovich;
Pawel Stankiewicz;
Marjorie Withers;
Carl S. Kashuk;
Aravinda Chakravarti;
James R. Lupski;
David Cutler;
Nicholas Katsanis
The modern synthetic view of human evolution proposes that the fixation of novel mutations is driven by the balance among selective advantage, selective disadvantage, and genetic drift. When considering the global architecture of the human genome, the same model can be applied to understanding the rapid acquisition and proliferation of exogenous DNA. To explore the evolutionary forces that might have morphed human genome architecture, we investigated the origin, composition, and functional potential of numts (nuclear mitochondrial pseudogenes), partial copies of the mitochondrial genome found abundantly in chromosomal DNA. Our data indicate that these elements are unlikely to be advantageous, since they possess no gross positional, transcriptional, or translational features that might indicate beneficial functionality subsequent to integration. Using sequence analysis and fossil dating, we also show a probable burst of integration of numts in the primate lineage that centers on the prosimian-anthropoid split, mimics closely the temporal distribution of Alu and processed pseudogene acquisition, and coincides with the major climatic change at the Paleocene-Eocene boundary. We therefore propose a model according to which the gross architecture and repeat distribution of the human genome can be largely accounted for by a population bottleneck early in the anthropoid lineage and subsequent effectively neutral fixation of repetitive DNA, rather than positive selection or unusual insertion pressures.
by
Wenyi Wang;
Peidong Shen;
Sreedevi Thiyagarajan;
Shengrong Lin;
Curtis Palm;
Rita Horvath;
Thomas Klopstock;
David Cutler;
Lynn Pique;
Iris Schrijver;
Ronald W. Davis;
Michael Mindrinos;
Terence P. Speed;
Curt Scharfe
A common goal in the discovery of rare functional DNA variants via medical resequencing is to incur a relatively lower proportion of false positive base-calls. We developed a novel statistical method for resequencing arrays (SRMA, sequence robust multi-array analysis) to increase the accuracy of detecting rare variants and reduce the costs in subsequent sequence verifications required in medical applications. SRMA includes single and multi-array analysis and accounts for technical variables as well as the possibility of both low- and high-frequency genomic variation. The confidence of each base-call was ranked using two quality measures. In comparison to Sanger capillary sequencing, we achieved a false discovery rate of 2 (false positive rate 1.2×10-5, false negative rate 5), which is similar to automated second-generation sequencing technologies. Applied to the analysis of 39 nuclear candidate genes in disorders of mitochondrial DNA (mtDNA) maintenance, we confirmed mutations in the DNA polymerase gamma POLG in positive control cases, and identified novel rare variants in previously undiagnosed cases in the mitochondrial topoisomerase TOP1MT, the mismatch repair enzyme MUTYH, and the apurinic-apyrimidinic endonuclease APEX2. Some patients carried rare heterozygous variants in several functionally interacting genes, which could indicate synergistic genetic effects in these clinically similar disorders.