Publication

Overview of the 8th Social Media Mining for Health Applications (#SMM4H) Shared Tasks at the AMIA 2023 Annual Symposium

Downloadable Content

Persistent URL
Last modified
  • 06/25/2025
Type of Material
Authors
    Ari Z. Klein, University of PennsylvaniaJuan M. Banda, Georgia State UniversityYuting Guo, Emory UniversityAna Lucia Schmidt, Roche Innovation CenterDongfang Xu, Cedars-Sinai Medical CenterJesus Ivan Flores Amaro, Cedars-Sinai Medical CenterRaul Rodriguez-Esteban, Roche Innovation CenterAbeed Sarker, Emory UniversityGraciela Gonzalez-Hernandez, Cedars-Sinai Medical Center
Language
  • English
Date
  • 2023-11-08
Publisher
  • NIH
Publication Version
Copyright Statement
  • The copyright holder for this preprint is the author/funder, who has granted medRxiv a license to display the preprint in perpetuity.
License
Final Published Version (URL)
Title of Journal or Parent Work
Volume
  • 2023
Grant/Funding Information
  • AZK, JIFA, DX, and GGH were supported in part by the National Library of Medicine (R01LM011176). YG and AS were supported in part by the National Institute on Drug Abuse (R01DA057599). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. JMB was supported in part by a Google Award for Inclusion Research (AIR).
Abstract
  • The aim of the Social Media Mining for Health Applications (#SMM4H) shared tasks is to take a community-driven approach to address the natural language processing and machine learning challenges inherent to utilizing social media data for health informatics. The eighth iteration of the #SMM4H shared tasks was hosted at the AMIA 2023 Annual Symposium and consisted of five tasks that represented various social media platforms (Twitter and Reddit), languages (English and Spanish), methods (binary classification, multi-class classification, extraction, and normalization), and topics (COVID-19, therapies, social anxiety disorder, and adverse drug events). In total, 29 teams registered, representing 18 countries. In this paper, we present the annotated corpora, a technical summary of the systems, and the performance results. In general, the top-performing systems used deep neural network architectures based on pre-trained transformer models. In particular, the top-performing systems for the classification tasks were based on single models that were pre-trained on social media corpora. To facilitate future work, the datasets—a total of 61,353 posts—will remain available by request, and the CodaLab sites will remain active for a post-evaluation phase.
Author Notes
  • Ari Z. Klein, PhD, Department of Biostatistics, Epidemiology, and Informatics, University of Pennsylvania, Blockley Hall, 4th Fl., 423 Guardian Dr. Philadelphia, PA 19104, USA; ariklein@pennmedicine.upenn.edu, Graciela Gonzalez-Hernandez, PhD, Department of Computational Biomedicine, Cedars-Sinai Medical Center, Pacific Design Center, Ste. G549F, 700 N. San Vicente Blvd., West Hollywood, CA, 90069, USA; graciela.gonzalezhernandez@cshs.org
Keywords
Research Categories
  • Health Sciences, Health Care Management
  • Health Sciences, Public Health

Tools

Relations

In Collection:

Items