Objectives With the objective of bringing clinical decision support systems to reality, this article reviews histopathological whole-slide imaging informatics methods, associated challenges, and future research opportunities.
Target audience This review targets pathologists and informaticians who have a limited understanding of the key aspects of whole-slide image (WSI) analysis and/or a limited knowledge of state-of-the-art technologies and analysis methods.
Scope First, we discuss the importance of imaging informatics in pathology and highlight the challenges posed by histopathological WSI. Next, we provide a thorough review of current methods for: quality control of histopathological images; feature extraction that captures image properties at the pixel, object, and semantic levels; predictive modeling that utilizes image features for diagnostic or prognostic applications; and data and information visualization that explores WSI for de novo discovery. In addition, we highlight future research directions and discuss the impact of large public repositories of histopathological data, such as the Cancer Genome Atlas, on the field of pathology informatics. Following the review, we present a case study to illustrate a clinical decision support system that begins with quality control and ends with predictive modeling for several cancer endpoints. Currently, state-of-the-art software tools only provide limited image processing capabilities instead of complete data analysis for clinical decision-making. We aim to inspire researchers to conduct more research in pathology imaging informatics so that clinical decision support can become a reality.
An Open Access Publishing Conference was convened in Atlanta, Georgia, on January 7, 2004, by the libraries of the Centers for Disease Control and Prevention (CDC) and Emory University. Open Access is an emerging publishing model for peer-reviewed scientific research in which authors and their publishers grant free access to their work as long as the authors are acknowledged and the publisher ensures that the work is made freely available in a digital archive (1). The conference brought together key stakeholders including scientists, researchers, publishers, and librarians and included approximately 240 participants with 80 offsite registrants connecting through the simultaneous Web cast.
by
Vince Calhoun;
Yuhui Du;
Zening Fu;
Jing Sui;
Shuang Gao;
Ying Xing;
Dongdong Lin;
Mustafa Salman;
Anees Abrol;
Md Abdur Rahaman;
Jiayu Chen;
Elliot L Hong;
Peter Kochunov;
Elizabeth A Osuch
Many mental illnesses share overlapping or similar clinical symptoms, confounding the diagnosis. It is important to systematically characterize the degree to which unique and similar changing patterns are reflective of brain disorders. Increasing sharing initiatives on neuroimaging data have provided unprecedented opportunities to study brain disorders. However, it is still an open question on replicating and translating findings across studies. Standardized approaches for capturing reproducible and comparable imaging markers are greatly needed. Here, we propose a pipeline based on the priori-driven independent component analysis, NeuroMark, which is capable of estimating brain functional network measures from functional magnetic resonance imaging (fMRI) data that can be used to link brain network abnormalities among different datasets, studies, and disorders. NeuroMark automatically estimates features adaptable to each individual subject and comparable across datasets/studies/disorders by taking advantage of the reliable brain network templates extracted from 1828 healthy controls as guidance. Four studies including 2442 subjects were conducted spanning six brain disorders (schizophrenia, autism spectrum disorder, mild cognitive impairment, Alzheimer's disease, bipolar disorder, and major depressive disorder) to evaluate validity of the proposed pipeline from different perspectives (replication of brain abnormalities, cross-study comparison, identification of subtle brain changes, and multi-disorder classification using identified biomarkers). Our results highlight that NeuroMark effectively identified replicated brain network abnormalities of schizophrenia across different datasets; revealed interesting neural clues on the overlap and specificity between autism and schizophrenia; demonstrated brain functional impairments present to varying degrees in mild cognitive impairments and Alzheimer's disease; and captured biomarkers that achieved good performance in classifying bipolar disorder and major depressive disorder.
by
Elizabeth Corwin;
Shirley M. Moore;
Andrea Plotsky;
Margaret M. Heitkemper;
Susan G. Dorsey;
Drenna Waldrop-Valverde;
Donald E. Bailey, Jr.;
Sharron L. Docherty;
Joanne D. Whitney;
Carol M. Musil;
Cynthia M. Dougherty;
Donna J. McCloskey;
Joan K. Austin;
Patricia A. Grady
Purpose: The purpose of this article is to describe the outcomes of a collaborative initiative to share data across five schools of nursing in order to evaluate the feasibility of collecting common data elements (CDEs) and developing a common data repository to test hypotheses of interest to nursing scientists. This initiative extended work already completed by the National Institute of Nursing Research CDE Working Group that successfully identified CDEs related to symptoms and self-management, with the goal of supporting more complex, reproducible, and patient-focused research. Design: Two exemplars describing the group's efforts are presented. The first highlights a pilot study wherein data sets from various studies by the represented schools were collected retrospectively, and merging of the CDEs was attempted. The second exemplar describes the methods and results of an initiative at one school that utilized a prospective design for the collection and merging of CDEs. Methods: Methods for identifying a common symptom to be studied across schools and for collecting the data dictionaries for the related data elements are presented for the first exemplar. The processes for defining and comparing the concepts and acceptable values, and for evaluating the potential to combine and compare the data elements are also described. Presented next are the steps undertaken in the second exemplar to prospectively identify CDEs and establish the data dictionaries. Methods for common measurement and analysis strategies are included. Findings: Findings from the first exemplar indicated that without plans in place a priori to ensure the ability to combine and compare data from disparate sources, doing so retrospectively may not be possible, and as a result hypothesis testing across studies may be prohibited. Findings from the second exemplar, however, indicated that a plan developed prospectively to combine and compare data sets is feasible and conducive to merged hypothesis testing. Conclusions: Although challenges exist in combining CDEs across studies into a common data repository, a prospective, well-designed protocol for identifying, coding, and comparing CDEs is feasible and supports the development of a common data repository and the testing of important hypotheses to advance nursing science. Clinical Relevance: Incorporating CDEs across studies will increase sample size and improve data validity, reliability, transparency, and reproducibility, all of which will increase the scientific rigor of the study and the likelihood of impacting clinical practice and patient care.
Background: Researchers are developing methods to automatically extract clinically relevant and useful patient characteristics from raw healthcare datasets. These characteristics, often capturing essential properties of patients with common medical conditions, are called computational phenotypes. Being generated by automated or semiautomated, data-driven methods, such potential phenotypes need to be validated as clinically meaningful (or not) before they are acceptable for use in decision making. Objective: The objective of this study was to present Phenotype Instance Verification and Evaluation Tool (PIVET), a framework that uses co-occurrence analysis on an online corpus of publicly available medical journal articles to build clinical relevance evidence sets for user-supplied phenotypes. PIVET adopts a conceptual framework similar to the pioneering prototype tool PheKnow-Cloud that was developed for the phenotype validation task. PIVET completely refactors each part of the PheKnow-Cloud pipeline to deliver vast improvements in speed without sacrificing the quality of the insights PheKnow-Cloud achieved. Methods: PIVET leverages indexing in NoSQL databases to efficiently generate evidence sets. Specifically, PIVET uses a succinct representation of the phenotypes that corresponds to the index on the corpus database and an optimized co-occurrence algorithm inspired by the Aho-Corasick algorithm. We compare PIVET's phenotype representation with PheKnow-Cloud's by using PheKnow-Cloud's experimental setup. In PIVET's framework, we also introduce a statistical model trained on domain expert-verified phenotypes to automatically classify phenotypes as clinically relevant or not. Additionally, we show how the classification model can be used to examine user-supplied phenotypes in an online, rather than batch, manner. Results: PIVET maintains the discriminative power of PheKnow-Cloud in terms of identifying clinically relevant phenotypes for the same corpus with which PheKnow-Cloud was originally developed, but PIVET's analysis is an order of magnitude faster than that of PheKnow-Cloud. Not only is PIVET much faster, it can be scaled to a larger corpus and still retain speed. We evaluated multiple classification models on top of the PIVET framework and found ridge regression to perform best, realizing an average F1 score of 0.91 when predicting clinically relevant phenotypes. Conclusions: Our study shows that PIVET improves on the most notable existing computational tool for phenotype validation in terms of speed and automation and is comparable in terms of accuracy.
Our objective was to design and implement a clinical history database capable of linking to our database of quantitative results from 99mTc-mercaptoacetyltriglycine (MAG3) renal scans and export a data summary for physicians or our software decision support system.
Methods
For database development, we used a commercial program. Additional software was developed in Interactive Data Language. MAG3 studies were processed using an in-house enhancement of a commercial program. The relational database has 3 parts: a list of all renal scans (the RENAL database), a set of patients with quantitative processing results (the Q2 database), and a subset of patients from Q2 containing clinical data manually transcribed from the hospital information system (the CLINICAL database). To test interobserver variability, a second physician transcriber reviewed 50 randomly selected patients in the hospital information system and tabulated 2 clinical data items: hydronephrosis and presence of a current stent. The CLINICAL database was developed in stages and contains 342 fields comprising demographic information, clinical history, and findings from up to 11 radiologic procedures. A scripted algorithm is used to reliably match records present in both Q2 and CLINICAL. An Interactive Data Language program then combines data from the 2 databases into an XML (extensible markup language) file for use by the decision support system. A text file is constructed and saved for review by physicians.
Results
RENAL contains 2,222 records, Q2 contains 456 records, and CLINICAL contains 152 records. The interobserver variability testing found a 95% match between the 2 observers for presence or absence of ureteral stent (κ = 0.52), a 75% match for hydronephrosis based on narrative summaries of hospitalizations and clinical visits (κ = 0.41), and a 92% match for hydronephrosis based on the imaging report (κ = 0.84).
Conclusion
We have developed a relational database system to integrate the quantitative results of MAG3 image processing with clinical records obtained from the hospital information system. We also have developed a methodology for formatting clinical history for review by physicians and export to a decision support system. We identified several pitfalls, including the fact that important textual information extracted from the hospital information system by knowledgeable transcribers can show substantial interobserver variation, particularly when record retrieval is based on the narrative clinical records.
Proprietary approaches for representing annotations and image markup are serious barriers for researchers to share image data and knowledge. The Annotation and Image Markup (AIM) project is developing a standard based information model for image annotation and markup in health care and clinical trial environments. The complex hierarchical structures of AIM data model pose new challenges for managing such data in terms of performance and support of complex queries. In this paper, we present our work on managing AIM data through a native XML approach, and supporting complex image and annotation queries through native extension of XQuery language. Through integration with xService, AIM databases can now be conveniently shared through caGrid.
Research data warehouses integrate research and patient data from one or more sources into a single data model that is designed for research. Typically, institutions update their warehouse by fully reloading it periodically. The alternative is to update the warehouse incrementally with new, changed and/or deleted data. Full reloads avoid having to correct and add to a live system, but they can render the data outdated for clinical trial accrual. They place a substantial burden on source systems, involve intermittent work that is challenging to resource, and may involve tight coordination across IT and informatics units. We have implemented daily incremental updating for our i2b2 data warehouse. Incremental updating requires substantial up-front development, and it can expose provisional data to investigators. However, it may support more use cases, it may be a better fit for academic healthcare IT organizational structures, and ongoing support needs appear to be similar or lower.
Because of the limitations of the Global Positioning System (GPS) in indoor scenarios, various types of indoor positioning or localization technologies have been proposed and deployed. Wireless radio signals have been widely used for both communication and localization purposes due to their popular availability in indoor spaces. However, the accuracy of indoor localization based purely on radio signals is still not perfect. Recently, visible light communication (VLC) has made use of electromagnetic radiation from light sources for transmitting data. The potential for deploying visible light communication for indoor localization has been investigated in recent years. Visible-light-based localization enjoys low deployment cost, high throughput, and high security. In this article, the most recent advances in visible-light-based indoor localization systems have been reviewed. We strongly believe that visible-light-based localization will become a low-cost and feasible complementary solution for indoor localization and other smart building applications.
Background
Finding eligible studies for meta-analysis and systematic reviews relies on keyword-based searching as the gold standard, despite its inefficiency. Searching based on direct citations is not sufficiently comprehensive. We propose a novel strategy that ranks articles on their degree of co-citation with one or more “known” articles before reviewing their eligibility.
Method
In two independent studies, we aimed to reproduce the results of literature searches for sets of published meta-analyses (n = 10 and n = 42). For each meta-analysis, we extracted co-citations for the randomly selected ‘known’ articles from the Web of Science database, counted their frequencies and screened all articles with a score above a selection threshold. In the second study, we extended the method by retrieving direct citations for all selected articles.
Results
In the first study, we retrieved 82 % of the studies included in the meta-analyses while screening only 11 % as many articles as were screened for the original publications. Articles that we missed were published in non-English languages, published before 1975, published very recently, or available only as conference abstracts. In the second study, we retrieved 79 % of included studies while screening half the original number of articles.
Conclusions
Citation searching appears to be an efficient and reasonably accurate method for finding articles similar to one or more articles of interest for meta-analysis and reviews.