Genome-wide association studies of complex traits frequently find that SNP-based estimates of heritability are considerably smaller than estimates from classic family-based studies. This ‘missing’ heritability may be partly explained by genetic variants interacting with other genes or environments that are difficult to specify, observe, and detect. To circumvent these challenges, we propose a new method to detect genetic interactions that leverages pleiotropy from multiple related traits without requiring the interacting variable to be specified or observed. Our approach, Latent Interaction Testing (LIT), uses the observation that correlated traits with shared latent genetic interactions have trait variance and covariance patterns that differ by genotype. LIT examines the relationship between trait variance/covariance patterns and genotype using a flexible kernel-based framework that is computationally scalable for biobank-sized datasets with a large number of traits. We first use simulated data to demonstrate that LIT substantially increases power to detect latent genetic interactions compared to a trait-by-trait univariate method. We then apply LIT to four obesity-related traits in the UK Biobank and detect genetic variants with interactive effects near known obesity-related genes. Overall, we show that LIT, implemented in the R package lit, uses shared information across traits to improve detection of latent genetic interactions compared to standard approaches.
Alzheimer’s disease (AD) pathology develops many years before the onset of cognitive symptoms. Two pathological processes—aggregation of the amyloid-β (Aβ) peptide into plaques and the microtubule protein tau into neurofibrillary tangles (NFTs)—are hallmarks of the disease. However, other pathological brain processes are thought to be key disease mediators of Aβ plaque and NFT pathology. How these additional pathologies evolve over the course of the disease is currently unknown. Here we show that proteomic measurements in autosomal dominant AD cerebrospinal fluid (CSF) linked to brain protein coexpression can be used to characterize the evolution of AD pathology over a timescale spanning six decades. SMOC1 and SPON1 proteins associated with Aβ plaques were elevated in AD CSF nearly 30 years before the onset of symptoms, followed by changes in synaptic proteins, metabolic proteins, axonal proteins, inflammatory proteins and finally decreases in neurosecretory proteins. The proteome discriminated mutation carriers from noncarriers before symptom onset as well or better than Aβ and tau measures. Our results highlight the multifaceted landscape of AD pathophysiology and its temporal evolution. Such knowledge will be critical for developing precision therapeutic interventions and biomarkers for AD beyond those associated with Aβ and tau.
by
Randy L. Parrish;
Aron S. Buchman;
Shinya Tasaki;
Yanling Wang;
Denis Avey;
Jishu Xu;
Philip L. De Jager;
David A. Bennett;
Michael Epstein;
Jingjing Yang
Multiple reference panels of a given tissue or multiple tissues often exist, and multiple regression methods could be used for training gene expression imputation models for TWAS. To leverage expression imputation models (i.e., base models) trained with multiple reference panels, regression methods, and tissues, we develop a Stacked Regression based TWAS (SR-TWAS) tool which can obtain optimal linear combinations of base models for a given validation transcriptomic dataset. Both simulation and real studies showed that SR-TWAS improved power due to increased effective training sample sizes and borrowed strength across multiple regression methods and tissues. Leveraging base models across multiple reference panels, tissues, and regression methods, our studies of Alzheimer's disease (AD) dementia and Parkinson's disease (PD) identified respective 11 independent significant risk genes for AD (supplementary motor area tissue) and 12 independent significant risk genes for PD (substantia nigra tissue), including 6 novels for AD and 6 novels for PD.
by
Kelsey Robinson;
Trenell J Mosley;
Kenneth S Rivera-Gonzalez;
Christopher R Jabbarpour;
Ssarh W Curtis;
Wasiu Lanre Adeyemo;
Terri H Beaty;
Azeez Butali;
Carmen J Buxó;
David Cutler;
Michael Epstein;
Lord JJ Gowans;
Jacqueline T Hect;
Jeffrey C Murray;
Gary M Shaw;
Lina Moreno Uribe;
Seth M Weinberg;
Harrison Brand;
Mary L Marazita;
Robert J Lipinski;
Elizabeth Leslie
Orofacial clefts (OFCs) are the most common craniofacial birth defects and are often categorized into two etiologically distinct groups: cleft lip with or without cleft palate (CL/P) and isolated cleft palate (CP). CP is highly heritable, but there are still relatively few established genetic risk factors associated with its occurrence compared to CL/P. Historically, CP has been studied as a single phenotype despite manifesting across a spectrum of defects involving the hard and/or soft palate. We performed GWAS using transmission disequilibrium tests using 435 case-parent trios to evaluate broad risks for any cleft palate (ACP, n=435), as well as subtype-specific risks for any cleft soft palate (CSP, n=259) and any cleft hard palate (CHP, n=125). We identified a single genome-wide significant locus at 9q33.3 (lead SNP rs7035976, p=4.24x10−8) associated with CHP. One gene at this locus, angiopoietin-like 2 (ANGPTL2), plays a role in osteoblast differentiation. It is expressed in craniofacial tissue of human embryos, as well as in the developing mouse palatal shelves. We found 19 additional loci reaching suggestive significance (p<5x10−6), of which only one overlapped between groups (chromosome 17q24.2, ACP and CSP). Odds ratios (ORs) for each of the 20 loci were most similar across all three groups for SNPs associated with the ACP group, but more distinct when comparing SNPs associated with either the CSP or CHP groups. We also found nominal evidence of replication (p<0.05) for 22 SNPs previously associated with cleft palate (including CL/P). Interestingly, most SNPs associated with CL/P cases were found to convey the opposite effect in those replicated in our dataset for CP only. Ours is the first study to evaluate CP risks in the context of its subtypes and we provide newly reported associations affecting the broad risk for CP as well as evidence of subtype-specific risks.
Genetic studies of psychiatric disorders often deal with phenotypes that are not directly measurable. Instead, researchers rely on multivariate symptom data from questionnaires and surveys like the PTSD Symptom Scale (PSS) and Beck Depression Inventory (BDI) to indirectly assess a latent phenotype of interest. Researchers subsequently collapse such multivariate questionnaire data into a univariate outcome to represent a surrogate for the latent phenotype. However, when a causal variant is only associated with a subset of collapsed symptoms, the effect will be challenging to detect using the univariate outcome. We describe a more powerful strategy for genetic association testing in this situation that jointly analyzes the original multivariate symptom data collectively using a statistical framework that compares similarity in multivariate symptom-scale data from questionnaires to similarity in common genetic variants across a gene. We use simulated data to demonstrate this strategy provides substantially increased power over standard approaches that collapse questionnaire data into a single surrogate outcome. We also illustrate our approach using GWAS data from the Grady Trauma Project and identify genes associated with BDI not identified using standard univariate techniques. The approach is computationally efficient, scales to genome-wide studies, and is applicable to correlated symptom data of arbitrary dimension.
Aims: The study of rare variants, which can potentially explain a great proportion of heritability, has emerged as an important topic in human gene mapping of complex diseases. Although several statistical methods have been developed to increase the power to detect disease-related rare variants, none of these methods address an important issue that often arises in genetic studies: false positives due to population stratification. Using simulations, we investigated the impact of population stratification on false-positive rates of rare-variant association tests.
Methods: We simulated a series of case-control studies assuming various sample sizes and levels of population structure. Using such data, we examined the impact of population stratification on rare-variant collapsing and burden tests of rare variation. We further evaluated the ability of 2 existing methods (principal component analysis and genomic control) to correct for stratification in such rare-variant studies.
Results: We found that population stratification can have a significant influence on studies of rare variants especially when the sample size is large and the population is severely stratified. Our results showed that principal component analysis performed quite well in most situations, while genomic control often yielded conservative results.
Conclusions: Our results imply that researchers need to carefully match cases and controls on ancestry in order to avoid false positives caused by population structure in studies of rare variants, particularly if genome-wide data are not available.
To help uncover the genetic determinants of complex disease, a scientist often designs an association study using either unrelated subjects or family members within pedigrees. But which of these two subject recruitment paradigms is preferable? This editorial addresses the debate over the relative merits of family- and population-based genetic association studies. We begin by briefly recounting the evolution of genetic epidemiology and the rich crossroads of statistics and genetics. We then detail the arguments for the two aforementioned paradigms in recent and current applications. Finally, we speculate on how the debate may progress with the emergence of next-generation sequencing technologies.
Next-generation sequencing technology has propelled the development of statistical methods to identify rare polygenetic variation associated with complex traits. The majority of these statistical methods are designed for case–control or population-based studies, with few methods that are applicable to family-based studies. Moreover, existing methods for family-based studies mainly focus on trios or nuclear families; there are far fewer existing methods available for analyzing larger pedigrees of arbitrary size and structure. To fill this gap, we propose a method for rare-variant analysis in large pedigree studies that can utilize information from all available relatives. Our approach is based on a kernel machine regression (KMR) framework, which has the advantages of high power, as well as fast and easy calculation of p-values using the asymptotic distribution. Our method is also robust to population stratification due to integration of a QTDT framework (Abecasis et al., Eur J Hum Genet 8(7):545–551, 2000b) with the KMR framework. In our method, we first calculate the expected genotype (between-family component) of a non-founder using all founders’ information and then calculate the deviates (within-family component) of observed genotype from the expectation, where the deviates are robust to population stratification by design. The test statistic, which is constructed using within-family component, is thus robust to population stratification. We illustrate and evaluate our method using simulated data and sequence data from Genetic Analysis Workshop 18.
The 3q29 deletion confers increased risk for neuropsychiatric phenotypes including intellectual disability, autism spectrum disorder, generalized anxiety disorder, and a >40-fold increased risk for schizophrenia. To investigate consequences of the 3q29 deletion in an experimental system, we used CRISPR/Cas9 technology to introduce a heterozygous deletion into the syntenic interval on C57BL/6 mouse chromosome 16. mRNA abundance for 20 of the 21 genes in the interval was reduced by ~50%, while protein levels were reduced for only a subset of these, suggesting a compensatory mechanism. Mice harboring the deletion manifested behavioral impairments in multiple domains including social interaction, cognitive function, acoustic startle, and amphetamine sensitivity, with some sex-dependent manifestations. In addition, 3q29 deletion mice showed reduced body weight throughout development consistent with the phenotype of 3q29 deletion syndrome patients. Of the genes within the interval, DLG1 has been hypothesized as a contributor to the neuropsychiatric phenotypes. However, we show that Dlg1 +/- mice did not exhibit the behavioral deficits seen in mice harboring the full 3q29 deletion. These data demonstrate the following: the 3q29 deletion mice are a valuable experimental system that can be used to interrogate the biology of 3q29 deletion syndrome; behavioral manifestations of the 3q29 deletion may have sex-dependent effects; and mouse-specific behavior phenotypes associated with the 3q29 deletion are not solely due to haploinsufficiency of Dlg1.
Objective
Major depressive disorder (MDD) arises from a combination of genetic and environmental risk factors and DNA methylation is one of the molecular mechanisms through which these factors can manifest. However, little is known about the epigenetic signature of MDD in brain tissue. This study aimed to investigate associations between brain tissue-based DNA methylation and late-life MDD.
Methods
We performed a brain epigenome-wide association study (EWAS) of late-life MDD in 608 participants from the Religious Order Study and the Rush Memory and Aging Project (ROS/MAP) using DNA methylation profiles of the dorsal lateral prefrontal cortex generated using the Illumina HumanMethylation450 Beadchip array. We also conducted an EWAS of MDD in each sex separately.
Results
We found epigenome-wide significant associations between brain tissue-based DNA methylation and late-life MDD. The most significant and robust association was found with altered methylation levels in the YOD1 locus (cg25594636, p value = 2.55 × 10−11; cg03899372, p value = 3.12 × 10−09; cg12796440, p value = 1.51 × 10−08, cg23982678, p value = 7.94 × 10−08). Analysis of differentially methylated regions (p value = 5.06 × 10−10) further confirmed this locus. Other significant loci include UGT8 (cg18921206, p value = 1.75 × 10−08), FNDC3B (cg20367479, p value = 4.97 × 10−08) and SLIT2 (cg10946669, p value = 8.01 × 10−08). Notably, brain tissue-based methylation levels were strongly associated with late-life MDD in men more than in women.
Conclusions
We identified altered methylation in the YOD1, UGT8, FNDC3B, and SLIT2 loci as new epigenetic factors associated with late-life MDD. Furthermore, our study highlights the sex-specific molecular heterogeneity of MDD.