The bioCADDIE dataset retrieval challenge brought together different approaches to retrieval of biomedical datasets relevant to a user’s query, expressed as a text description of a needed dataset. We describe experiments in applying a data-driven, machine learning-based approach to biomedical dataset retrieval as part of this challenge. We report on a series of experiments carried out to evaluate the performance of both probabilistic and machine learning-driven techniques from information retrieval, as applied to this challenge. Our experiments with probabilistic information retrieval methods, such as query term weight optimization, automatic query expansion and simulated user relevance feedback, demonstrate that automatically boosting the weights of important keywords in a verbose query is more effective than other methods. We also show that although there is a rich space of potential representations and features available in this domain, machine learning-based re-ranking models are not able to improve on probabilistic information retrieval techniques with the currently available training data. The models and algorithms presented in this paper can serve as a viable implementation of a search engine to provide access to biomedical datasets. The retrieval performance is expected to be further improved by using additional training data that is created by expert annotation, or gathered through usage logs, clicks and other processes during natural operation of the system.
Database URL: https://github.com/emory-irlab/biocaddie.
The Visual Paired Comparison (VPC) task is a recognition memory test that has shown promise for the detection of memory impairments associated with mild cognitive impairment (MCI). Because patients with MCI often progress to Alzheimer's Disease (AD), the VPC may be useful in predicting the onset of AD. VPC uses noninvasive eye tracking to identify how subjects view novel and repeated visual stimuli. Healthy control subjects demonstrate memory for the repeated stimuli by spending more time looking at the novel images, i.e., novelty preference. Here, we report an application of machine learning methods from computer science to improve the accuracy of detecting MCI by modeling eye movement characteristics such as fixations, saccades, and re-fixations during the VPC task. These characteristics are represented as features provided to automatic classification algorithms such as Support Vector Machines (SVMs). Using the SVM classification algorithm, in tandem with modeling the patterns of fixations, saccade orientation, and regression patterns, our algorithm was able to automatically distinguish age-matched normal control subjects from MCI subjects with 87% accuracy, 97% sensitivity and 77% specificity, compared to the best available classification performance of 67% accuracy, 60% sensitivity, and 73% specificity when using only the novelty preference information. These results demonstrate the effectiveness of applying machine-learning techniques to the detection of MCI, and suggest a promising approach for detection of cognitive impairments associated with other disorders.