False discovery rate (FDR) control is an important tool of statistical inference in feature selection. In mass spectrometry-based metabolomics data, features can be measured at different levels of reliability and false features are often detected in untargeted metabolite profiling as chemical and/or bioinformatics noise. The traditional false discovery rate methods treat all features equally, which can cause substantial loss of statistical power to detect differentially expressed features. We propose a reliability index for mass spectrometry-based metabolomics data with repeated measurements, which is quantified using a composite measure. We then present a new method to estimate the local false discovery rate (lfdr) that incorporates feature reliability. In simulations, our proposed method achieved better balance between sensitivity and controlling false discovery, as compared to traditional lfdr estimation. We applied our method to a real metabolomics dataset and were able to detect more differentially expressed metabolites that were biologically meaningful.
Identifying factors associated with increased medical cost is important for many micro- and macro-institutions, including the national economy and public health, insurers and the insured. However, assembling comprehensive national databases that include both the cost and individual-level predictors can prove challenging. Alternatively, one can use data from smaller studies with the understanding that conclusions drawn from such analyses may be limited to the participant population. At the same time, smaller clinical studies have limited follow-up and lifetime medical cost may not be fully observed for all study participants. In this context, we develop new model selection methods and inference procedures for secondary analyses of clinical trial data when lifetime medical cost is subject to induced censoring. Our model selection methods extend a theory of penalized estimating function to a calibration regression estimator tailored for this data type. Next, we develop a novel inference procedure for the unpenalized regression estimator using perturbation and resampling theory. Then, we extend this resampling plan to accommodate regularized coefficient estimation of censored lifetime medical cost and develop postselection inference procedures for the final model. Our methods are motivated by data from Southwest Oncology Group Protocol 9509, a clinical trial of patients with advanced nonsmall cell lung cancer, and our models of lifetime medical cost are specific to this population. But the methods presented in this article are built on rather general techniques and could be applied to larger databases as those data become available.
The case-cohort design facilitates economical investigation of risk factors in a large survival study, with covariate data collected only from the cases and a simple random subset of the full cohort. Methods that accommodate the design have been developed for various semiparametric models, but most inference procedures are based on asymptotic distribution theory. Such inference can be cumbersome to derive and implement, and does not permit confidence band construction. While the bootstrap is an obvious alternative, it is unclear how to resample because of complications from the two-stage sampling design. We establish an equivalent sampling scheme, and propose a novel and versatile nonparametric bootstrap for robust inference with an appealingly simple single-stage resampling. Theoretical justification and numerical assessment are provided for a number of procedures under the proportional hazards model.
Background: Severe secondary hyperparathyroidism, which is associated with life-threatening complications, can develop in dialysis-dependent end-stage renal disease patients. The aim of this study was to compare short- and long-term mortality in dialysis patients who underwent near-total parathyroidectomy (NTPTX) and matched nonoperated controls.
Study Design: We identified 150 dialysis patients who underwent NTPTX (1993-2009) at our institution and compared them with 1,044 nonoperated control patients identified in the US Renal Data System registry, matched for age, sex, race, diabetes as cause of kidney failure, years on dialysis, and dialysis modality. Survival outcomes were estimated using multivariable Cox proportional hazards models with stratification on the matching sets, adjusted for cardiovascular comorbidities, smoking, inability to ambulate/transfer, and payor status.
Results: During a follow-up of a mean of 3.6 years (range 0.1 month to 16.4 years), NTPTX patients had a significant reduction in the long-term risk of all-cause death (hazard ratio = 0.68; 95% CI, 0.52-0.89; p = 0.006) compared with controls. Thirty-day mortality rates for NTPTX patients and controls were 246 vs 105 per 1,000 person-years (p = 0.21). In adjusted analyses, NTPTX patients had a 37% reduced risk of all-cause death and a 33% reduced risk of cardiovascular death compared with controls. A durable reduction in mean parathyroid hormone was observed after NTPTX; from 1,776 ± 1,416.6 pg/mL to 301 ± 285.7 pg/mL (p < 0.0001).
Conclusions: In our center, NTPTX in dialysis patients was associated with a significant reduction in long-term risk of death compared with matched control patients, without a significantly increased short-term risk.
Importance: National Healthcare Safety Network methods for central line-associated bloodstream infection (CLABSI) surveillance do not account for potential additive risk for CLABSI associated with use of 2 central venous catheters (CVCs) at the same time (concurrent CVCs); facilities that serve patients requiring high acuity care with medically indicated concurrent CVC use likely disproportionally incur Centers for Medicare & Medicaid Services payment penalties for higher CLABSI rates. Objective: To quantify the risk for CLABSI associated with concurrent use of a second CVC. Design, Setting, and Participants: This retrospective cohort study included adult patients with 2 or more days with a CVC at 4 geographically separated general acute care hospitals in the Atlanta, Georgia, area that varied in size from 110 to 580 beds, from January 1, 2012, to December 31, 2017. Variables included clinical conditions, central line-days, and concurrent CVC use. Patients were propensity score-matched for likelihood of concurrence (limited to 2 CVCs), and conditional logistic regression modeling was performed to estimate the risk of CLABSI associated with concurrence. Episodes of CVC were categorized as low or high risk and single vs concurrent use to evaluate time to CLABSI with Cox proportional hazards regression models. Data were analyzed from January to June 2019. Exposures: Two CVCs present at the same time. Main Outcomes and Measures: Hospitalizations in which a patient developed a CLABSI, allowing estimation of patient risk for CLABSI and daily hazard for a CVC episode ending in CLABSI. Results: Among a total of 50 254 patients (median [interquartile range] age, 59 [45-69] years; 26 661 [53.1%] women), 64 575 CVCs were used and 647 CLABSIs were recorded. Concurrent CVC use was recorded in 6877 patients (13.7%); the most frequent indications for concurrent CVC use were nutrition (554 patients [14.1%]) or hemodialysis (1706 patients [43.4%]). In the propensity score-matched cohort, 74 of 3932 patients with concurrent CVC use (1.9%) developed CLABSI, compared with 81 of 7864 patients with single CVC use (1.0%). Having 2 CVCs for longer than two-thirds of a patient's CVC use duration was associated with increased likelihood of developing a CLABSI, adjusting for central line-days and comorbidities (adjusted risk ratio, 1.62; 95% CI, 1.10-2.33; P = .001). In survival analysis adjusting for sex, receipt of chemotherapy or total parenteral nutrition, and facility, compared with a single CVC, the daily hazard for 2 low-risk CVCs was 1.78 (95% CI, 1.35-2.34; P < .001), while the daily hazard for 1 low-risk and 1 high-risk CVC was 1.80 (95% CI, 1.42-2.28; P < .001), and the daily hazard for 2 high-risk CVCs was 1.78 (95% CI, 1.14-2.77; P = .01). Conclusions and Relevance: These findings suggest that concurrent CVC use is associated with nearly 2-fold the risk of CLABSI compared with use of a single low-risk CVC. Performance metrics for CLABSI should change to account for variations of this intrinsic patient risk among facilities to reduce biased comparisons and resultant penalties applied to facilities that are caring for more patients with medically indicated concurrent CVC use.
Diagnostic tests usually need to operate at a high sensitivity or specificity level in practice. Accordingly, specificity at the controlled sensitivity, or vice versa, is a clinically sensible performance metric for evaluating continuous biomarkers. Meanwhile, the performance of a biomarker may vary across sub-populations as defined by covariates, and covariate-specific evaluation can be informative. In this article, we develop a novel modeling and estimation method for covariate-specific specificity at a controlled sensitivity level. Unlike existing methods which typically adopt elaborate models of covariate effects over the entire biomarker distribution, our approach models covariate effects locally at a specific sensitivity level of interest. We also extend our proposed model to handle the whole continuum of sensitivities via dynamic regression and derive covariate-specific ROC curves. We provide the variance estimation through bootstrapping. The asymptotic properties are established. We conduct extensive simulation studies to evaluate the performance of our proposed methods in comparison with existing methods, and further illustrate the applications in two clinical studies for aggressive prostate cancer.
We aimed to describe the longitudinal risk of advanced heart failure (HF) leading to death, heart transplantation, or ventricular assist device (VAD) placement after congenital heart surgery (CHS) and how it varies across the spectrum of congenital heart disease. We linked the records of patients who underwent first CHS in the Pediatric Cardiac Care Consortium between 1982 and 2003 with the United States National Death Index and Organ Procurement and Transplantation Network databases. Primary outcome was time from CHS discharge to HF-related death, heart transplant, or VAD placement, analyzed with proportional hazards models accounting for competing mortality. In 35,610 patients who survived a first CHS, there were 799 HF deaths, transplants, or VADs over a median of 23 years (interquartile range, 19 to 27). Cumulative incidence at 25 years was 2.3% (95% confidence interval [CI] 2.1% to 2.4%). Compared to mild 2-ventricle defects, the adjusted subhazard ratio for moderate and severe 2-ventricle defects was 3.21 (95% CI 2.28 to 4.52) and 9.46 (95% CI 6.71 to 13.3), respectively, and for single-ventricle defects 31.8 (95% CI 22.2 to 45.6). Systemic right ventricle carried the highest risk 2 years after CHS (subhazard ratio 2.76 [95% CI 2.08 to 3.68]). All groups had higher rates of HF-related death compared with the general population (cause-specific standardized mortality ratio 56.1 [95% CI 51.0 to 61.2]). In conclusion, the risk of advanced HF leading to death, transplantation, or VAD was high across the spectrum of congenital heart disease. While severe defects carry the highest risk, those with mild disease are still at greater risk than the general population.
In a multivariable logistic regression setting where measuring a continuous exposure requires an expensive assay, a design in which the biomarker is measured in pooled samples from multiple subjects can be very cost effective. A logistic regression model for poolwise data is available, but validity requires that the assay yields the precise mean exposure for members of each pool. To account for errors, we assume the assay returns the true mean exposure plus a measurement error (ME) and/or a processing error (PE). We pursue likelihood-based inference for a binary health-related outcome modeled by logistic regression coupled with a normal linear model relating individual-level exposure to covariates and assuming that the ME and PE components are independent and normally distributed regardless of pool size. We compare this approach with a discriminant function-based alternative, and we demonstrate the potential value of incorporating replicates into the study design. Applied to a reproductive health dataset with pools of size 2 along with individual samples and replicates, the model fit with both ME and PE had a lower AIC than a model accounting for ME only. Relative to ignoring errors, this model suggested a somewhat higher (though still nonsignificant) adjusted log-odds ratio associating the cytokine MCP-1 with risk of spontaneous abortion. Simulations modeled after these data confirm validity of the methods, demonstrate how ME and particularly PE can reduce the efficiency advantage of a pooling design, and highlight the value of replicates in improving stability when both errors are present.
Multiple biomarkers are often combined to improve disease diagnosis. The uniformly optimal combination, that is, with respect to all reasonable performance metrics, unfortunately requires excessive distributional modeling, to which the estimation can be sensitive. An alternative strategy is rather to pursue local optimality with respect to a specific performance metric. Nevertheless, existing methods may not target clinical utility of the intended medical test, which usually needs to operate above a certain sensitivity or specificity level, or do not have their statistical properties well studied and understood. In this article, we develop and investigate a linear combination method to maximize the clinical utility empirically for such a constrained classification. The combination coefficient is shown to have cube root asymptotics. The convergence rate and limiting distribution of the predictive performance are subsequently established, exhibiting robustness of the method in comparison with others. An algorithm with sound statistical justification is devised for efficient and high-quality computation. Simulations corroborate the theoretical results, and demonstrate good statistical and computational performance. Illustration with a clinical study on aggressive prostate cancer detection is provided.