About this item:

285 Views | 154 Downloads

Author Notes:

Correspondence: Dane R. Van Domelen, Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, 1518 Clifton Road GCR, Room 323 Atlanta, Georgia 30322. dvandom@emory.edu.


Research Funding:

This research was supported by the Intramural Research Program of the Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland.

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under grant DGE-0940903.

The views expressed in this article are those of the authors, and no official endorsement by the Department of Health and Human Services, the Agency for Healthcare Research and Quality, or the National Science Foundation, is intended or should be inferred.


  • Science & Technology
  • Life Sciences & Biomedicine
  • Physical Sciences
  • Mathematical & Computational Biology
  • Public, Environmental & Occupational Health
  • Medical Informatics
  • Medicine, Research & Experimental
  • Statistics & Probability
  • Research & Experimental Medicine
  • Mathematics
  • hybrid design
  • maximum likelihood
  • measurement error
  • pooling

Logistic regression with a continuous exposure measured in pools and subject to errors


Journal Title:

Statistics in Medicine


Volume 37, Number 27


, Pages 4007-4021

Type of Work:

Article | Post-print: After Peer Review


In a multivariable logistic regression setting where measuring a continuous exposure requires an expensive assay, a design in which the biomarker is measured in pooled samples from multiple subjects can be very cost effective. A logistic regression model for poolwise data is available, but validity requires that the assay yields the precise mean exposure for members of each pool. To account for errors, we assume the assay returns the true mean exposure plus a measurement error (ME) and/or a processing error (PE). We pursue likelihood-based inference for a binary health-related outcome modeled by logistic regression coupled with a normal linear model relating individual-level exposure to covariates and assuming that the ME and PE components are independent and normally distributed regardless of pool size. We compare this approach with a discriminant function-based alternative, and we demonstrate the potential value of incorporating replicates into the study design. Applied to a reproductive health dataset with pools of size 2 along with individual samples and replicates, the model fit with both ME and PE had a lower AIC than a model accounting for ME only. Relative to ignoring errors, this model suggested a somewhat higher (though still nonsignificant) adjusted log-odds ratio associating the cytokine MCP-1 with risk of spontaneous abortion. Simulations modeled after these data confirm validity of the methods, demonstrate how ME and particularly PE can reduce the efficiency advantage of a pooling design, and highlight the value of replicates in improving stability when both errors are present.

Copyright information:

© 2018 John Wiley & Sons, Ltd.

Export to EndNote