Copy number variants (CNVs) have a major role in the etiology of autism spectrum disorders (ASD), and several of these have reached statistical significance in case-control analyses. Nevertheless, current ASD cohorts are not large enough to detect very rare CNVs that may be causative or contributory (that is, risk alleles). Here, we use a tiered approach, in which clinically significant CNVs are first identified in large clinical cohorts of neurodevelopmental disorders (including but not specific to ASD), after which these CNVs are then systematically identified within well-characterized ASD cohorts. We focused our initial analysis on 48 recurrent CNVs (segmental duplication-mediated 'hotspots') from 24 loci in 31 516 published clinical cases with neurodevelopmental disorders and 13 696 published controls, which yielded a total of 19 deletion CNVs and 11 duplication CNVs that reached statistical significance. We then investigated the overlap of these 30 CNVs in a combined sample of 3955 well-characterized ASD cases from three published studies. We identified 73 deleterious recurrent CNVs, including 36 deletions from 11 loci and 37 duplications from seven loci, for a frequency of 1 in 54; had we considered the ASD cohorts alone, only 58 CNVs from eight loci (24 deletions from three loci and 34 duplications from five loci) would have rea ched statistical significance. In conclusion, until there are sufficiently large ASD research cohorts with enough power to detect very rare causative or contributory CNVs, data from larger clinical cohorts can be used to infer the likely clinical significance of CNVs in ASD.
by
Lavinia Gordon;
Jihoon E. Joo;
Joseph E. Powell;
Miina Ollikainen;
Boris Novakovic;
Xin Li;
Roberta Andronikos;
Mark N. Cruickshank;
Karen Conneely;
Alicia Smith;
Reid S. Alisch;
Ruth Morley;
Peter M. Visscher;
Jeffrey M. Craig;
Richard Saffery
Comparison between groups of monozygotic (MZ) and dizygotic (DZ) twins enables an estimation of the relative contribution of genetic and shared and nonshared environmental factors to phenotypic variability. Using DNA methylation profiling of ∼20,000 CpG sites as a phenotype, we have examined discordance levels in three neonatal tissues from 22 MZ and 12 DZ twin pairs. MZ twins exhibit a wide range of within-pair differences at birth, but show discordance levels generally lower than DZ pairs.Within-pairmethylation discordance was lowest in CpG islands in all twins and increased as a function of distance from islands. Variance component decomposition analysis of DNA methylation in MZ and DZ pairs revealed a low mean heritability across all tissues, although a wide range of heritabilities was detected for specific genomic CpG sites. The largest component of variation was attributed to the combined effects of nonshared intrauterine environment and stochastic factors. Regression analysis of methylation on birth weight revealed a general association between methylation of genes involved in metabolism and biosynthesis, providing further support for epigenetic change in the previously described link between low birth weight and increasing risk for cardiovascular, metabolic, and other complex diseases. Finally, comparison of our data with that of several older twins revealed little evidence for genome-wide epigenetic drift with increasing age. This is the first study to analyze DNA methylation on a genome scale in twins at birth, further highlighting the importance of the intrauterine environment on shaping the neonatal epigenome.
Chromosome 22q13.3 deletion (Phelan McDermid) syndrome (PMS) is a rare genetic neurodevelopmental disorder resulting from deletions or other genetic variants on distal 22q. Pathological variants of the SHANK3 gene have been identified, but terminal chromosomal deletions including SHANK3 are most common. Terminal deletions disrupt up to 108 protein-coding genes. The impact of these losses is highly variable and includes both significantly impairing neurodevelopmental and somatic manifestations. The current review combines two metrics, prevalence of gene loss and predicted loss pathogenicity, to identify likely contributors to phenotypic expression. These genes are grouped according to function as follows: molecular signaling at glutamate synapses, phenotypes involving neuropsychiatric disorders, involvement in multicellular organization, cerebellar development and functioning, and mitochondrial. The likely most impactful genes are reviewed to provide information for future clinical and translational investigations.
An emerging paradigm shift for disease diagnosis is to rely on molecular characterization beyond traditional clinical and symptom-based examinations. Although genetic alterations and transcription signature were first introduced as potential biomarkers, clinical implementations of these markers are limited due to low reproducibility and accuracy. Instead, epigenetic changes are considered as an alternative approach to disease diagnosis. Complex epigenetic regulation is required for normal biological functions and it has been shown that distinctive epigenetic disruptions could contribute to disease pathogenesis. Disease-specific epigenetic changes, especially DNA methylation, have been observed, suggesting its potential as disease biomarkers for diagnosis. In addition to specificity, the feasibility of detecting disease-associated methylation marks in the biological specimens collected noninvasively, such as blood samples, has driven the clinical studies to validate disease-specific DNA methylation changes as a diagnostic biomarker. Here, we highlight the advantages of DNA methylation signature for diagnosis in different diseases and discuss the statistical and technical challenges to be overcome before clinical implementation.
Kernel machine learning methods, such as the SNP-set kernel association test (SKAT), have been widely used to test associations between traits and genetic polymorphisms. In contrast to traditional single-SNP analysis methods, these methods are designed to examine the joint effect of a set of related SNPs (such as a group of SNPs within a gene or a pathway) and are able to identify sets of SNPs that are associated with the trait of interest. However, as with many multi-SNP testing approaches, kernel machine testing can draw conclusion only at the SNP-set level, and does not directly inform on which one(s) of the identified SNP set is actually driving the associations. A recently proposed procedure, KerNel Iterative Feature Extraction (KNIFE), provides a general framework for incorporating variable selection into kernel machine methods. In this article, we focus on quantitative traits and relatively common SNPs, and adapt the KNIFE procedure to genetic association studies and propose an approach to identify driver SNPs after the application of SKAT to gene set analysis. Our approach accommodates several kernels that are widely used in SNP analysis, such as the linear kernel and the Identity by State (IBS) kernel. The proposed approach provides practically useful utilities to prioritize SNPs, and fills the gap between SNP set analysis and biological functional studies. Both simulation studies and real data application are used to demonstrate the proposed approach.
Tuberous sclerosis complex (TSC) is an autosomal dominant, tumor predisposition disorder characterized by significant neurodevelopmental brain lesions, such as tubers and subependymal nodules. The neuropathology of TSC is often associated with seizures and intellectual disability. To learn about the developmental perturbations that lead to these brain lesions, we created a mouse model that selectively deletes the Tsc2 gene from radial glial progenitor cells in the developing cerebral cortex and hippocampus. These Tsc2 mutant mice were severely runted, developed post-natal megalencephaly and died between 3 and 4 weeks of age. Analysis of brain pathology demonstrated cortical and hippocampal lamination defects, hippocampal heterotopias, enlarged dysplastic neurons and glia, abnormal myelination and an astrocytosis. These histologic abnormalities were accompanied by activation of the mTORC1 pathway as assessed by increased phosphorylated S6 in brain lysates and tissue sections. Developmental analysis demonstrated that loss of Tsc2 increased the subventricular Tbr2-positive basal cell progenitor pool at the expense of early born Tbr1-positive post-mitotic neurons. These results establish the novel concept that loss of function of Tsc2 in radial glial progenitors is one initiating event in the development of TSC brain lesions as well as underscore the importance of Tsc2 in the regulation of neural progenitor pools. Given the similarities between the mouse and the human TSC lesions, this model will be useful in further understanding TSC brain pathophysiology, testing potential therapies and identifying other genetic pathways that are altered in TSC.
by
Prataydipta Rudra;
K. Alaine Broadaway;
Erin B. Ware;
Min A. Jhun;
Lawrence F. Bielak;
Wei Zhao;
Jennifer A. Smith;
Patricia A. Peyser;
Sharon L. R. Kardia;
Michael Epstein;
Debashis Ghosh
Many gene mapping studies of complex traits have identified genes or variants that influence multiple phenotypes. With the advent of next-generation sequencing technology, there has been substantial interest in identifying rare variants in genes that possess cross-phenotype effects. In the presence of such effects, modeling both the phenotypes and rare variants collectively using multivariate models can achieve higher statistical power compared to univariate methods that either model each phenotype separately or perform separate tests for each variant. Several studies collect phenotypic data over time and using such longitudinal data can further increase the power to detect genetic associations. Although rare-variant approaches exist for testing cross-phenotype effects at a single time point, there is no analogous method for performing such analyses using longitudinal outcomes. In order to fill this important gap, we propose an extension of Gene Association with Multiple Traits (GAMuT) test, a method for cross-phenotype analysis of rare variants using a framework based on the distance covariance. The approach allows for both binary and continuous phenotypes and can also adjust for covariates. Our simple adjustment to the GAMuT test allows it to handle longitudinal data and to gain power by exploiting temporal correlation. The approach is computationally efficient and applicable on a genome-wide scale due to the use of a closed-form test whose significance can be evaluated analytically. We use simulated data to demonstrate that our method has favorable power over competing approaches and also apply our approach to exome chip data from the Genetic Epidemiology Network of Arteriopathy.
There has been increasing interest in identifying genes within the human genome that influence multiple diverse phenotypes. In the presence of pleiotropy, joint testing of these phenotypes is not only biologically meaningful but also statistically more powerful than univariate analysis of each separate phenotype accounting for multiple testing. Although many cross-phenotype association tests exist, the majority of such methods assume samples composed of unrelated subjects and therefore are not applicable to family-based designs, including the valuable case-parent trio design. In this paper, we describe a robust gene-based association test of multiple phenotypes collected in a case-parent trio study. Our method is based on the kernel distance covariance (KDC) method, where we first construct a similarity matrix for multiple phenotypes and a similarity matrix for genetic variants in a gene; we then test the dependency between the two similarity matrices. The method is applicable to either common variants or rare variants in a gene, and resulting tests from the method are by design robust to confounding due to population stratification. We evaluated our method through simulation studies and observed that the method is substantially more powerful than standard univariate testing of each separate phenotype. We also applied our method to phenotypic and genotypic data collected in case-parent trios as part of the Genetics of Kidneys in Diabetes (GoKinD) study and identified a genome-wide significant gene demonstrating cross-phenotype effects that was not identified using standard univariate approaches.
Genome-wide association studies (GWAS) are a popular approach for identifying common genetic variants and epistatic effects associated with a disease phenotype. The traditional statistical analysis of such GWAS attempts to assess the association between each individual single-nucleotide polymorphism (SNP) and the observed phenotype. Recently, kernel machine-based tests for association between a SNP set (e.g., SNPs in a gene) and the disease phenotype have been proposed as a useful alternative to the traditional individual-SNP approach, and allow for flexible modeling of the potentially complicated joint SNP effects in a SNP set while adjusting for covariates. We extend the kernel machine framework to accommodate related subjects from multiple independent families, and provide a score-based variance component test for assessing the association of a given SNP set with a continuous phenotype, while adjusting for additional covariates and accounting for within-family correlation. We illustrate the proposed method using simulation studies and an application to genetic data from the Genetic Epidemiology Network of Arteriopathy (GENOA) study.