Publication

Phenotype instance verification and evaluation tool (PIVET): A scaled phenotype evidence generation framework using web-based medical literature

Downloadable Content

Persistent URL
Last modified
  • 05/15/2025
Type of Material
Authors
    Jette Henderson, University of Texas at AustinJunyuan Ke, Emory UniversityJoyce C. Ho, Emory UniversityJoydeep Ghosh, University of Texas at AustinBryon C Wallace, Northeastern University
Language
  • English
Date
  • 2018-05-01
Publisher
  • JMIR Publications
Publication Version
Copyright Statement
  • ©Jette Henderson, Junyuan Ke, Joyce C Ho, Joydeep Ghosh, Byron C Wallace. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 04.05.2018.
License
Final Published Version (URL)
Title of Journal or Parent Work
ISSN
  • 1438-8871
Volume
  • 20
Issue
  • 5
Start Page
  • e164
End Page
  • e164
Grant/Funding Information
  • JH is supported by NSF grant 1418504.
Abstract
  • Background: Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. Objective: The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publicly available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. Methods: PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET's phenotype representation with PheKnow-Cloud's by using PheKnow-Cloud's experimental setup. In PIVET's framework, we also introduce a statistical model trained on domain expert-verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. Results: PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET's analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. Conclusions: Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy.
Author Notes
  • Jette Henderson, BA, MS The University of Texas at Austin 2501 Speedway, C0806 EER, 6.804 Austin, TX, 78712 United States Phone: 1 5125855478 Email: jette [at] ices.utexas.edu
Keywords
Research Categories
  • Engineering, Mining
  • Computer Science
  • Information Science

Tools

Relations

In Collection:

Items