Publication

Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets

Downloadable Content

Persistent URL
Last modified
  • 05/21/2025
Type of Material
Authors
    Xin Bing, Cornell UniversityTyler Lovelace, University of PittsburghFlorentina Bunea, Cornell UniversityMarten Wegkamp, Cornell UniversitySudhir Kasturi, Emory UniversityHarinder Singh, University of PittsburghPanayiotis V Benos, University of PittsburghJishnu Das, University of Pittsburgh
Language
  • English
Date
  • 2022-05-13
Publisher
  • RELX
Publication Version
Copyright Statement
  • © 2022 The Author(s)
License
Final Published Version (URL)
Title of Journal or Parent Work
Volume
  • 3
Issue
  • 5
Start Page
  • 100473
End Page
  • 100473
Grant/Funding Information
  • This study was partially supported by NIH grants DP2AI164325 to J.D., U01HL137159, R01HL140963, R01HL159805, and R01HL157879 to P.V.B., U01AI141990 to H.S., and F31LM013966 to T.L.; NSF grants DMS-1712709 and DMS-2015195 to F.B. and M.W.; and DoD grant W81XWH2110864 to J.D. H.S. also acknowledges support from the UPMC ITTC fund. S.P.K. acknowledges support from the Yerkes Pilot Research Pilot Program (part of the Yerkes NPRC Base Grant, P51-OD011132).
Supplemental Material (URL)
Abstract
  • High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the differences in their data distributions, and their integration to infer causal relationships. Here, we present Essential Regression (ER), a novel latent-factor-regression-based interpretable machine-learning approach that addresses these problems by identifying latent factors and their likely cause-effect relationships with system-wide outcomes/properties of interest. ER can integrate many multi-omic datasets without structural or distributional assumptions regarding the data. It outperforms a range of state-of-the-art methods in terms of prediction. ER can be coupled with probabilistic graphical modeling, thereby strengthening the causal inferences. The utility of ER is demonstrated using multi-omic system immunology datasets to generate and validate novel cellular and molecular inferences in a wide range of contexts including immunosenescence and immune dysregulation.
Author Notes
Keywords
Research Categories
  • Health Sciences, Immunology
  • Biology, Biostatistics

Tools

Relations

In Collection:

Items