Publication

MMiDaS-AE

Downloadable Content

Persistent URL
Last modified
  • 05/22/2025
Type of Material
Authors
    Eric W Lee, Emory UniversityBryon C Wallace, Northeastern UniversityKarla Galaviz Arredondo, Emory UniversityKarla I Galaviz, Emory UniversityJoyce Ho, Emory University
Language
  • English
Date
  • 2020-02-04
Publisher
  • Association for Computing Machinery
Publication Version
Copyright Statement
  • © 2020 Copyright held by the owner/author(s)
Final Published Version (URL)
Title of Journal or Parent Work
Volume
  • 2020
Start Page
  • 139
End Page
  • 150
Grant/Funding Information
  • Byron C. Wallace was supported by the National Library of Medicine of the National Institutes of Health under award number 2R01LM012086. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
  • Eric W. Lee and Joyce C. Ho were supported by the National Science Foundation award IIS-#1838200 and the National Institute of Health award 1K01LM012924–01. Karla I. Galaviz was supported by the National Institute of Diabetes and Digestive and Kidney Diseases (P30DK111024).
Abstract
  • Systematic review (SR) is an essential process to identify, evaluate, and summarize the findings of all relevant individual studies concerning health-related questions. However, conducting a SR is labor-intensive, as identifying relevant studies is a daunting process that entails multiple researchers screening thousands of articles for relevance. In this paper, we propose MMiDaS-AE, a Multi-modal Missing Data aware Stacked Autoencoder, for semi-automating screening for SRs. We use a multi-modal view that exploits three representations, of: 1) documents, 2) topics, and 3) citation networks. Documents that contain similar words will be nearby in the document embedding space. Models can also exploit the relationship between documents and the associated SR MeSH terms to capture article relevancy. Finally, related works will likely share the same citations, and thus closely related articles would, intuitively, be trained to be close to each other in the embedding space. However, using all three learned representations as features directly result in an unwieldy number of parameters. Thus, motivated by recent work on multi-modal auto-encoders, we adopt a multi-modal stacked autoencoder that can learn a shared representation encoding all three representations in a compressed space. However, in practice one or more of these modalities may be missing for an article (e.g., if we cannot recover citation information). Therefore, we propose to learn to impute the shared representation even when specific inputs are missing. We find this new model significantly improves performance on a dataset consisting of 15 SRs compared to existing approaches.
Author Notes
Keywords
Research Categories
  • Health Sciences, Medicine and Surgery

Tools

Relations

In Collection:

Items