About this item:

58 Views | 37 Downloads

Author Notes:

Abeed Sarker, PhD 101 Woodruff Circle, Suite 4101, Atlanta, GA 30322, United States. Email: abeed@dbmi.emory.edu

All authors made substantial contributions to manuscript revisions and approved the final version. A.S. and A.G. contributed to the design of the study. S.L. conducted data analysis under the mentorship of A.S.. A.S. and A.G. supervised the conception, design, and revision of the manuscript. All authors also agree to be accountable for the accuracy and integrity of the work presented here

Disclosure: None declared.

Subject:

Research Funding:

A.G.'s efforts were funded by the National Institute of Mental Health through the “ My Data Choices, evaluation of effective consent strategies for patients with behavioral health conditions ” (R01 MH108992) grant. A.S.'s efforts were funded by the National Institute on Drug Abuse through the “ Mining Social Media Big Data for Toxicovigilance: Automating the Monitoring of Prescription Medication Abuse via Natural Language Processing and Machine Learning Methods ” (R01 DA046619) grant.

Keywords:

  • Science & Technology
  • Technology
  • Life Sciences & Biomedicine
  • Computer Science, Information Systems
  • Health Care Sciences & Services
  • Medical Informatics
  • Computer Science
  • prescription drugs
  • natural language processing
  • opioids
  • medication value sets

A Data-Driven Iterative Approach for Semi-automatically Assessing the Correctness of Medication Value Sets: A Proof of Concept Based on Opioids

Tools:

Journal Title:

METHODS OF INFORMATION IN MEDICINE

Volume:

Volume 60, Number S 02

Publisher:

, Pages E111-E119

Type of Work:

Article | Final Publisher PDF

Abstract:

Background Value sets are lists of terms (e.g., opioid medication names) and their corresponding codes from standard clinical vocabularies (e.g., RxNorm) created with the intent of supporting health information exchange and research. Value sets are manually-created and often exhibit errors. Objectives The aim of the study is to develop a semi-automatic, data-centric natural language processing (NLP) method to assess medication-related value set correctness and evaluate it on a set of opioid medication value sets. Methods We developed an NLP algorithm that utilizes value sets containing mostly true positives and true negatives to learn lexical patterns associated with the true positives, and then employs these patterns to identify potential errors in unseen value sets. We evaluated the algorithm on a set of opioid medication value sets, using the recall, precision and F 1-score metrics. We applied the trained model to assess the correctness of unseen opioid value sets based on recall. To replicate the application of the algorithm in real-world settings, a domain expert manually conducted error analysis to identify potential system and value set errors. Results Thirty-eight value sets were retrieved from the Value Set Authority Center, and six (two opioid, four non-opioid) were used to develop and evaluate the system. Average precision, recall, and F 1-score were 0.932, 0.904, and 0.909, respectively on uncorrected value sets; and 0.958, 0.953, and 0.953, respectively after manual correction of the same value sets. On 20 unseen opioid value sets, the algorithm obtained average recall of 0.89. Error analyses revealed that the main sources of system misclassifications were differences in how opioids were coded in the value sets-while the training value sets had generic names mostly, some of the unseen value sets had new trade names and ingredients. Conclusion The proposed approach is data-centric, reusable, customizable, and not resource intensive. It may help domain experts to easily validate value sets.

Copyright information:

The Author(s).

This is an Open Access work distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/rdf).
Export to EndNote