Publication

Predicting monthly high-resolution PM2.5 concentrations with random forest model in the North China Plain

Downloadable Content

Persistent URL
Last modified
  • 05/14/2025
Type of Material
Authors
    Keyong Huang, Emory UniversityQingyang Xiao, Emory UniversityXia Meng, Emory UniversityGuannan Geng, Emory UniversityYujie Wang, NASA Goddard Space Flight CenterAlexei Lyapustin, NASA Goddard Space Flight CenterDongfeng Gu, Chinese Academy of SciencesYang Liu, Emory University
Language
  • English
Date
  • 2018-11-01
Publisher
  • Elsevier
Publication Version
Copyright Statement
  • © 2018 Elsevier Ltd
License
Final Published Version (URL)
Title of Journal or Parent Work
ISSN
  • 0269-7491
Volume
  • 242
Issue
  • Pt A
Start Page
  • 675
End Page
  • 683
Grant/Funding Information
  • The work of Q. Xiao, G. Geng and Y. Liu was partially supported by Assistance Agreement No. 83586901 awarded by the U.S. Environmental Protection Agency to Y. Liu.
  • The work of X. Meng and Y. Liu was partially supported by the National Institutes of Health (Grant # 1R01ES027892).
  • The work of K. Huang was supported by the China Scholarship Council (201706210381).
Supplemental Material (URL)
Abstract
  • Exposure to fine particulate matter (PM2.5) remains a worldwide public health issue. However, epidemiological studies on the chronic health impacts of PM2.5 in the developing countries are hindered by the lack of monitoring data. Despite the recent development of using satellite remote sensing to predict ground-level PM2.5 concentrations in China, methods for generating reliable historical PM2.5 exposure, especially prior to the construction of PM2.5 monitoring network in 2013, are still very rare. In this study, a high-performance machine-learning model was developed directly at monthly level to estimate PM2.5 levels in North China Plain. We developed a random forest model using the latest Multi-angle implementation of atmospheric correction (MAIAC) aerosol optical depth (AOD), meteorological parameters, land cover and ground PM2.5 measurements from 2013 to 2015. A multiple imputation method was applied to fill the missing values of AOD. We used 10-fold cross-validation (CV) to evaluate model performance and a separate time period, January 2016 to December 2016, was used to validate our model's capability of predicting historical PM2.5 concentrations. The overall model CV R2 and relative prediction error (RPE) were 0.88 and 18.7%, respectively. Validation results beyond the modeling period (2013–2015) shown that this model can accurately predict historical PM2.5 concentrations at the monthly (R2 = 0.74, RPE = 27.6%), seasonal (R2 = 0.78, RPE = 21.2%) and annual (R2 = 0.76, RPE = 16.9%) level. The annual mean predicted PM2.5 concentration from 2013 to 2016 in our study domain was 67.7 μg/m3 and Southern Hebei, Western Shandong and Northern Henan were the most polluted areas. Using this computationally efficient, monthly and high-resolution model, we can provide reliable historical PM2.5 concentrations for epidemiological studies on PM2.5 health effects in China. Random forest model developed at monthly level using satellite data can be applied to estimate long-term PM2.5 concentrations in North China Plain.
Author Notes
  • Yang Liu PhD, Department of Environmental Health, Rollins School of Public Health, Emory University, Atlanta, GA 30322, USA, 1518 Clifton Road NE, Atlanta, GA 30322, USA, yang.liu@emory.edu
Keywords
Research Categories
  • Environmental Sciences

Tools

Relations

In Collection:

Items