About this item:

12 Views | 7 Downloads

Author Notes:

Correspondence: Camille M. Williams, williams.m.camille@gmail.com; Danielle M. Dick, danielle.m.dick@rutgers.edu; Richard Karlsson Linnér r.karlsson.linner@law.leidenuniv.nl

Acknowledgements: The Externalizing Consortium would like to thank the following groups for making the research possible: 23andMe, Add Health, Vanderbilt University Medical Center’s BioVU, Collaborative Study on the Genetics of Alcoholism (COGA), the Psychiatric Genomics Consortium’s Substance Use Disorders working group, UK10K Consortium, UK Biobank, and Philadelphia Neurodevelopmental Cohort.

Author contributions: CMW: Contribution: Conceptualization, Data curation, Formal analysis, Investigation, Visualization, Methodology, Writing – original draft, Writing – review, and editing. HP: Conceptualization, Data curation, Formal analysis, Methodology, Writing—original draft, Writing – review, and editing. PTT: Conceptualization, Writing—original draft, Writing – review, and editing, Visualization. HK: Data curation, Software, Writing—original draft. NSC-K: Formal analysis, Writing—Original Draft, Writing—review and editing. DL-C: Validation, Writing—Original Draft. TTM: Conceptualization, Data curation, Methodology, Supervision. PB: Formal analyses, Writing – review, and editing. PDK: Conceptualization, Writing—review and editing. IDW: Conceptualization, Writing—review and editing. SS-R: Conceptualization, Writing—review and editing. KPH: Conceptualization, Writing—review and editing. AAP: Conceptualization, Writing—review and editing. DMD: Conceptualization, Writing—review and editing. RKL: Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Supervision, Writing – review and editing.

Competing interests: Camille M. Williams, Holly Poore, Peter T. Tanksley, Hyeokmoon Kweon, Natasia S. Courchesne-Krak, Diego Londono-Correa, Travis T. Mallard, Peter Barr, Philipp D. Koellinger, Irwin D. Waldman, Sandra Sanchez-Roige, K. Paige Harden, Abraham A Palmer, Danielle M. Dick and Richard Karlsson Linnér declare that they have no conflict of interest.

Subjects:

Research Funding:

This research was conducted by the Externalizing Consortium. The Externalizing Consortium has been supported by the National Institute on Alcohol Abuse and Alcoholism (R01AA015416 – administrative supplement to DMD), and the National Institute on Drug Abuse (R01DA050721 to DMD). Additional funding for investigator effort has been provided by K02AA018755, U10AA008401, P50AA022537 to DMD, R01AA029688, and 28IR-0070 to AAP and T29KT0526 and T32IR5226 to NCK and SSR from the Tobacco-Related Disease Research Program (TRDRP), NIDA DP1DA054394 to SSR, R25MH081482-16 to NCK, R01HD092548 to KPH, as well as a European Research Council Consolidator Grant (647648 EdGe) to PDK.

Tobacco-Related Disease Research Program, T29KT0526, T29KT0526, R01AA029688, K02AA018755, National Institute on Drug Abuse, R25MH081482-16, DP1DA054394, R01HD092548, R01DA050721, European Research Council Consolidator Grant, 647648 EdGe, National Institute on Alcohol Abuse and Alcoholism, R01AA015416

Keywords:

  • Genomic SEM
  • Summary statistics
  • Data removal
  • Down-sample
  • Leave-one-out
  • Meta-analysis
  • Genomics
  • Genome-wide association study

Guidelines for Evaluating the Comparability of Down-Sampled GWAS Summary Statistics

Show all authors Show less authors

Tools:

Journal Title:

Behavior Genetics

Volume:

Volume 53, Number 5-6

Publisher:

, Pages 404-415

Type of Work:

Article | Final Publisher PDF

Abstract:

Proprietary genetic datasets are valuable for boosting the statistical power of genome-wide association studies (GWASs), but their use can restrict investigators from publicly sharing the resulting summary statistics. Although researchers can resort to sharing down-sampled versions that exclude restricted data, down-sampling reduces power and might change the genetic etiology of the phenotype being studied. These problems are further complicated when using multivariate GWAS methods, such as genomic structural equation modeling (Genomic SEM), that model genetic correlations across multiple traits. Here, we propose a systematic approach to assess the comparability of GWAS summary statistics that include versus exclude restricted data. Illustrating this approach with a multivariate GWAS of an externalizing factor, we assessed the impact of down-sampling on (1) the strength of the genetic signal in univariate GWASs, (2) the factor loadings and model fit in multivariate Genomic SEM, (3) the strength of the genetic signal at the factor level, (4) insights from gene-property analyses, (5) the pattern of genetic correlations with other traits, and (6) polygenic score analyses in independent samples. For the externalizing GWAS, although down-sampling resulted in a loss of genetic signal and fewer genome-wide significant loci; the factor loadings and model fit, gene-property analyses, genetic correlations, and polygenic score analyses were found robust. Given the importance of data sharing for the advancement of open science, we recommend that investigators who generate and share down-sampled summary statistics report these analyses as accompanying documentation to support other researchers’ use of the summary statistics.

Copyright information:

© The Author(s) 2023

This is an Open Access work distributed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
Export to EndNote