Background: Chest radiographs (CXR) are frequently used as a screening tool for patients with suspected COVID-19 infection pending reverse transcriptase polymerase chain reaction (RT-PCR) results, despite recommendations against this. We evaluated radiologist performance for COVID-19 diagnosis on CXR at the time of patient presentation in the Emergency Department (ED). Materials and methods: We extracted RT-PCR results, clinical history, and CXRs of all patients from a single institution between March and June 2020. 984 RT-PCR positive and 1043 RT-PCR negative radiographs were reviewed by 10 emergency radiologists from 4 academic centers. 100 cases were read by all radiologists and 1927 cases by 2 radiologists. Each radiologist chose the single best label per case: Normal, COVID-19, Other – Infectious, Other – Noninfectious, Non-diagnostic, and Endotracheal Tube. Cases labeled with endotracheal tube (246) or non-diagnostic (54) were excluded. Remaining cases were analyzed for label distribution, clinical history, and inter-reader agreement. Results: 1727 radiographs (732 RT-PCR positive, 995 RT-PCR negative) were included from 1594 patients (51.2% male, 48.8% female, age 59 ± 19 years). For 89 cases read by all readers, there was poor agreement for RT-PCR positive (Fleiss Score 0.36) and negative (Fleiss Score 0.46) exams. Agreement between two readers on 1638 cases was 54.2% (373/688) for RT-PCR positive cases and 71.4% (679/950) for negative cases. Agreement was highest for RT-PCR negative cases labeled as Normal (50.4%, n = 479). Reader performance did not improve with clinical history or time between CXR and RT-PCR result. Conclusion: At the time of presentation to the emergency department, emergency radiologist performance is non-specific for diagnosing COVID-19.
Purpose: The aim of this study was to assess racial/ethnic and socioeconomic disparities in the difference between atherosclerotic vascular disease prevalence measured by a multitask convolutional neural network (CNN) deep learning model using frontal chest radiographs (CXRs) and the prevalence reflected by administrative hierarchical condition category codes in two cohorts of patients with coronavirus disease 2019 (COVID-19). Methods: A CNN model, previously published, was trained to predict atherosclerotic disease from ambulatory frontal CXRs. The model was then validated on two cohorts of patients with COVID-19: 814 ambulatory patients from a suburban location (presenting from March 14, 2020, to October 24, 2020, the internal ambulatory cohort) and 485 hospitalized patients from an inner-city location (hospitalized from March 14, 2020, to August 12, 2020, the external hospitalized cohort). The CNN model predictions were validated against electronic health record administrative codes in both cohorts and assessed using the area under the receiver operating characteristic curve (AUC). The CXRs from the ambulatory cohort were also reviewed by two board-certified radiologists and compared with the CNN-predicted values for the same cohort to produce a receiver operating characteristic curve and the AUC. The atherosclerosis diagnosis discrepancy, Δvasc, referring to the difference between the predicted value and presence or absence of the vascular disease HCC categorical code, was calculated. Linear regression was performed to determine the association of Δvasc with the covariates of age, sex, race/ethnicity, language preference, and social deprivation index. Logistic regression was used to look for an association between the presence of any hierarchical condition category codes with Δvasc and other covariates. Results: The CNN prediction for vascular disease from frontal CXRs in the ambulatory cohort had an AUC of 0.85 (95% confidence interval, 0.82-0.89) and in the hospitalized cohort had an AUC of 0.69 (95% confidence interval, 0.64-0.75) against the electronic health record data. In the ambulatory cohort, the consensus radiologists’ reading had an AUC of 0.89 (95% confidence interval, 0.86-0.92) relative to the CNN. Multivariate linear regression of Δvasc in the ambulatory cohort demonstrated significant negative associations with non-English-language preference (β = −0.083, P < .05) and Black or Hispanic race/ethnicity (β = −0.048, P < .05) and positive associations with age (β = 0.005, P < .001) and sex (β = 0.044, P < .05). For the hospitalized cohort, age was also significant (β = 0.003, P < .01), as was social deprivation index (β = 0.002, P < .05). The Δvasc variable (odds ratio [OR], 0.34), Black or Hispanic race/ethnicity (OR, 1.58), non-English-language preference (OR, 1.74), and site (OR, 0.22) were independent predictors of having one or more hierarchical condition category codes (P < .01 for all) in the combined patient cohort. Conclusions: A CNN model was predictive of aortic atherosclerosis in two cohorts (one ambulatory and one hospitalized) with COVID-19. The discrepancy between the CNN model and the administrative code, Δvasc, was associated with language preference in the ambulatory cohort; in the hospitalized cohort, this discrepancy was associated with social deprivation index. The absence of administrative code(s) was associated with Δvasc in the combined cohorts, suggesting that Δvasc is an independent predictor of health disparities. This may suggest that biomarkers extracted from routine imaging studies and compared with electronic health record data could play a role in enhancing value-based health care for traditionally underserved or disadvantaged patients for whom barriers to care exist.
Objective: With increasing patient complexity whose data are stored in fragmented health information systems, automated and time-efficient ways of gathering important information from the patients' medical history are needed for effective clinical decision making. Using COVID-19 as a case study, we developed a query-bot information retrieval system with user-feedback to allow clinicians to ask natural questions to retrieve data from patient notes. Materials and methods: We applied clinicalBERT, a pre-trained contextual language model, to our dataset of patient notes to obtain sentence embeddings, using K-Means to reduce computation time for real-time interaction. Rocchio algorithm was then employed to incorporate user-feedback and improve retrieval performance. Results: In an iterative feedback loop experiment, MAP for final iteration was 0.93/0.94 as compared to initial MAP of 0.66/0.52 for generic and 1./1. compared to 0.79/0.83 for COVID-19 specific queries confirming that contextual model handles the ambiguity in natural language queries and feedback helps to improve retrieval performance. User-in-loop experiment also outperformed the automated pseudo relevance feedback method. Moreover, the null hypothesis which assumes identical precision between initial retrieval and relevance feedback was rejected with high statistical significance (p ≪ 0.05). Compared to Word2Vec, TF-IDF and bioBERT models, clinicalBERT works optimally considering the balance between response precision and user-feedback. Discussion: Our model works well for generic as well as COVID-19 specific queries. However, some generic queries are not answered as well as others because clustering reduces query performance and vague relations between queries and sentences are considered non-relevant. We also tested our model for queries with the same meaning but different expressions and demonstrated that these query variations yielded similar performance after incorporation of user-feedback. Conclusion: In conclusion, we develop an NLP-based query-bot that handles synonyms and natural language ambiguity in order to retrieve relevant information from the patient chart. User-feedback is critical to improve model performance.
Most human activity recognition datasets that are publicly available have data captured by using either smartphones or smartwatches, which are usually placed on the waist or the wrist, respectively. These devices obtain one set of acceleration and angular velocity in the x-, y-, and z-axis from the accelerometer and the gyroscope planted in these devices. The PLHI-MC10 dataset contains data obtained by using 3 BioStamp nPoint® sensors from 7 physically healthy adult test subjects performing different exercise activities. These sensors are the state-of-the-art biomedical sensors manufactured by MC10. Each of the three sensors was attached to the subject externally on three muscles-Extensor Digitorum (Posterior Forearm), Gastrocnemius (Calf), and Pectoralis (Chest)-giving us three sets of 3 axial acceleration, two sets of 3 axial angular velocities, and 1 set of voltage values from the heart. Using three different sensors instead of a single sensor improves precision. It helps distinguish between human activities as it simultaneously captures the movement and contractions of various muscles from separate parts of the human body. Each test subject performed five activities (stairs, jogging, skipping, lifting kettlebell, basketball throws) in a supervised environment. The data is cleaned, filtered, and synced.
Radiology reports are a rich resource for advancing deep learning applications for medical images, facilitating the generation of large-scale annotated image databases. Although the ambiguity and subtlety of natural language poses a significant challenge to information extraction from radiology reports. Thyroid Imaging Reporting and Data Systems (TI-RADS) has been proposed as a system to standardize ultrasound imaging reports for thyroid cancer screening and diagnosis, through the implementation of structured templates and a standardized thyroid nodule malignancy risk scoring system; however there remains significant variation in radiologist practice when it comes to diagnostic thyroid ultrasound interpretation and reporting. In this work, we propose a computerized approach using a contextual embedding and fusion strategy for the large-scale inference of TI-RADS final assessment categories from narrative ultrasound (US) reports. The proposed model has achieved high accuracy on an internal data set, and high performance scores on an external validation dataset.
BACKGROUND: Despite wide utilisation of severity scoring systems for case-mix determination and benchmarking in the intensive care unit, the possibility of scoring bias across ethnicities has not been examined. Recent guidelines on the use of illness severity scores to inform triage decisions for allocation of scarce resources such as mechanical ventilation during the current COVID-19 pandemic warrant examination for possible bias in these models. We investigated the performance of three severity scoring systems (APACHE IVa, OASIS, SOFA) across ethnic groups in two large ICU databases in order to identify possible ethnicity-based bias. METHOD: Data from the eICU Collaborative Research Database and the Medical Information Mart for Intensive Care were analysed for score performance in Asians, African Americans, Hispanics and Whites after appropriate exclusions. Discrimination and calibration were determined for all three scoring systems in all four groups. FINDINGS: While measurements of discrimination -area under the receiver operating characteristic curve (AUROC) -were significantly different among the groups, they did not display any discernible systematic patterns of bias. In contrast, measurements of calibration -standardised mortality ratio (SMR) -indicated persistent, and in some cases significant, patterns of difference between Hispanics and African Americans versus Asians and Whites. The differences between African Americans and Whites were consistently statistically significant. While calibrations were imperfect for all groups, the scores consistently demonstrated a pattern of over-predicting mortality for African Americans and Hispanics. INTERPRETATION: The systematic differences in calibration across ethnic groups suggest that illness severity scores reflect bias in their predictions of mortality. FUNDING: LAC is funded by the National Institute of Health through NIBIB R01 EB017205. There was no specific funding for this study.
Language modality within the vision language pretraining framework is
innately discretized, endowing each word in the language vocabulary a semantic
meaning. In contrast, visual modality is inherently continuous and
high-dimensional, which potentially prohibits the alignment as well as fusion
between vision and language modalities. We therefore propose to "discretize"
the visual representation by joint learning a codebook that imbues each visual
token a semantic. We then utilize these discretized visual semantics as
self-supervised ground-truths for building our Masked Image Modeling objective,
a counterpart of Masked Language Modeling which proves successful for language
models. To optimize the codebook, we extend the formulation of VQ-VAE which
gives a theoretic guarantee. Experiments validate the effectiveness of our
approach across common vision-language benchmarks.