Background: Previous epidemiologic studies suggest associations between preterm birth and ambient air pollution. Objective: We investigated associations between 11 ambient air pollutants, estimated by combining Community Multiscale Air Quality model (CMAQ) simulations with measurements from stationary monitors, and risk of preterm birth (< 37 weeks of gestation) in the U.S. state of Georgia. Methods: Birth records for singleton births ≥ 27 weeks of gestation with complete covariate information and estimated dates of conception between 1 January 2002 and 28 February 2006 were obtained from the Office of Health Indicators for Planning, Georgia Department of Public Health (n = 511,658 births). Daily pollutant concentrations at 12-km resolution were estimated for 11 ambient air pollutants. We used logistic regression with county-level fixed effects to estimate associations between preterm birth and average pollutant concentrations during the first and second trimester. Discrete-time survival models were used to estimate third-trimester and total pregnancy associations. Effect modification was investigated by maternal education, race, census tract poverty level, and county-level urbanicity. Results: Trimester-specific and total pregnancy associations (p < 0.05) were observed for several pollutants. All the traffic-related pollutants (carbon monoxide, nitrogen dioxide, PM2.5 elemental carbon) were associated with preterm birth [e.g., odds ratios for interquartile range increases in carbon monoxide during the first, second, and third trimesters and total pregnancy were 1.005 (95% CI: 1.001, 1.009), 1.007 (95% CI: 1.002, 1.011), 1.010 (95% CI: 1.006, 1.014), and 1.011 (95% CI: 1.006, 1.017)]. Associations tended to be higher for mothers with low educational attainment and African American mothers. Conclusion: Several ambient air pollutants were associated with preterm birth; associations were observed in all exposure windows.
Frameshift (FS) prediction is important for analysis and biological interpretation of metagenomic sequences. Since a genomic context of a short metagenomic sequence is rarely known, there is not enough data available to estimate parameters of species-specific statistical models of protein-coding and non-coding regions. The challenge of ab initio FS detection is, therefore, two fold: (i) to find a way to infer necessary model parameters and (ii) to identify positions of frameshifts (if any). Here we describe a new tool, MetaGeneTack, which uses a heuristic method to estimate parameters of sequence models used in the FS detection algorithm. It is shown on multiple test sets that the MetaGeneTack FS detection performance is comparable or better than the one of earlier developed program FragGeneScan.
Mechano-acoustic signals emanating from the heart and lungs contain valuable information about the cardiopulmonary system. Unobtrusive wearable sensors capable of monitoring these signals longitudinally can detect early pathological signatures and titrate care accordingly. Here, we present a wearable, hermetically-sealed high-precision vibration sensor that combines the characteristics of an accelerometer and a contact microphone to acquire wideband mechano-acoustic physiological signals, and enable simultaneous monitoring of multiple health factors associated with the cardiopulmonary system including heart and respiratory rate, heart sounds, lung sounds, and body motion and position of an individual. The encapsulated accelerometer contact microphone (ACM) utilizes nano-gap transducers to achieve extraordinary sensitivity in a wide bandwidth (DC-12 kHz) with high dynamic range. The sensors were used to obtain health factors of six control subjects with varying body mass index, and their feasibility in detection of weak mechano-acoustic signals such as pathological heart sounds and shallow breathing patterns is evaluated on patients with preexisting conditions.
The methylation of mammalian DNA, primarily at CpG dinucleotides, has long been recognized to play a major role in controlling gene expression, among other functions. Given their importance, it is surprising how many basic questions remain to be answered about the proteins responsible for this methylation and for coordination with the parallel chromatin-marking system that operates at the level of histone modification. This article reviews recent studies on, and discusses the resulting biochemical and structural insights into, the DNA nucleotide methyltransferase (Dnmt) proteins 1, 3a, 3a2, 3b, and 3L.
Motivation: The discovery that copy number variants (CNVs) are widespread in the human genome has motivated development of numerous algorithms that attempt to detect CNVs from intensity data. However, all approaches are plagued by high false discovery rates. Further, because CNVs are characterized by two dimensions (length and intensity) it is unclear how to order called CNVs to prioritize experimental validation. Results: We developed a univariate score that correlates with the likelihood that a CNV is true. This score can be used to order CNV calls in such a way that calls having larger scores are more likely to overlap a true CNV. We developed cnv.beast, a computationally efficient algorithm for calling CNVs that uses robust backward elimination regression to keep CNV calls with scores that exceed a user-defined threshold. Using an independent dataset that was measured using a different platform, we validated our score and showed that our approach performed better than six other currently-available methods.
Background
Cloud computing provides an infrastructure that facilitates large scale computational analysis in a scalable, democratized fashion, However, in this context it is difficult to ensure sharing of an analysis environment and associated data in a scalable and precisely reproducible way.
Results
CloudMan (usecloudman.org) enables individual researchers to easily deploy, customize, and share their entire cloud analysis environment, including data, tools, and configurations.
Conclusions
With the enabled customization and sharing of instances, CloudMan can be used as a platform for collaboration. The presented solution improves accessibility of cloud resources, tools, and data to the level of an individual researcher and contributes toward reproducibility and transparency of research solutions.
Increased reliance on computational approaches in the life sciences has revealed grave concerns about how accessible and reproducible computation-reliant results truly are. Galaxy http://usegalaxy.org, an open web-based platform for genomic research, addresses these problems. Galaxy automatically tracks and manages data provenance and provides support for capturing the context and intent of computational methods. Galaxy Pages are interactive, web-based documents that provide users with a medium to communicate a complete computational analysis.
Introduction: The United States federally mandated reporting of venous thromboembolism (VTE), defined by Agency for Healthcare Research & Quality Patient Safety Indicator 12 (AHRQ PSI-12), is based on administrative data, the accuracy of which has not been consistently demonstrated. We used IDEAL-X, a novel information extraction software system, to identify VTE from electronic medical records and evaluated its accuracy.
Methods: Medical records for 13,248 patients admitted to an orthopedic specialty hospital from 2009 to 2014 were reviewed. Patient encounters were defined as a hospital admission where both surgery (of the spine, hip, or knee) and a radiology diagnostic study that could detect VTE was performed. Radiology reports were both manually reviewed by a physician and analyzed by IDEAL-X.
Results: Among 2083 radiology reports, IDEAL-X correctly identified 176/181 VTE events, achieving a sensitivity of 97.2% [95% confidence interval (CI), 93.7%-99.1%] and specificity of 99.3% (95% CI, 98.9%-99.7%) when compared with manual review. Among 422 surgical encounters with diagnostic radiographic studies for VTE, IDEAL-X correctly identified 41 of 42 VTE events, achieving a sensitivity of 97.6% (95% CI, 87.4%-99.6%) and specificity of 99.8% (95% CI, 98.7%-100.0%). The performance surpassed that of AHRQ PSI-12, which had a sensitivity of 92.9% (95% CI, 80.5%-98.4%) and specificity of 92.9% (95% CI, 89.8%-95.3%), though only the difference in specificity was statistically significant (P<0.01).
Conclusion: IDEAL-X, a novel information extraction software system, identified VTE from radiology reports with high accuracy, with specificity surpassing AHRQ PSI-12. IDEAL-X could potentially improve detection and surveillance of many medical conditions from free text of electronic medical records.
The dynamics of any infectious disease are heavily dependent on the rate of transmission from infectious to susceptible hosts. In many disease models, this rate is captured in a single compound parameter, the probability of transmission β. However, closer examination reveals how β can be further decomposed into a number of biologically relevant variables, including contact rates among individuals and the probability that contact events actually result in disease transmission. We start by introducing some of the basic concepts underlying the different approaches to modeling disease transmission and by laying out why a more detailed understanding of the variables involved is usually desirable. We then describe how parameter estimates of these variables can be derived from empirical data, drawing primarily from the existing literature on human diseases. Finally, we discuss how these concepts and approaches may be applied to the study of pathogen transmission in wildlife diseases. In particular, we highlight recent technical innovations that could help to overcome some the logistical challenges commonly associated with empirical disease research in wild populations.
The problem of determining cut-points of a continuous scale according to an established categorical scale is often encountered in practice for the purposes such as making diagnosis or treatment recommendation, determining study eligibility, or facilitating interpretations. A general analytic framework was recently proposed for assessing optimal cut-points defined based on some pre-specified criteria. However, the implementation of the existing nonparametric estimators under this framework and the associated inferences can be computationally intensive when more than a few cut-points need to be determined. To address this important issue, a smoothing-based modification of the current method is proposed and is found to substantially improve the computational speed as well as the asymptotic convergence rate. Moreover, a plug-in type variance estimation procedure is developed to further facilitate the computation. Extensive simulation studies confirm the theoretical results and demonstrate the computational benefits of the proposed method. The practical utility of the new approach is illustrated by an application to a mental health study.