About this item:

44 Views | 9 Downloads

Author Notes:

Yang Liu, yang.liu@emory.edu

D. Zhang and L. Du contributed equally to this work. This research was conceptualized by Yang Liu. D. Zhang and L. Du analyzed data and drafted manuscript. W. Wang, Q. Zhu and J, Bi provided technical support for data processing. All authors participated in manuscript preparation.

The authors thank the Department of Environment, Forestry and Fisheries, the South African Weather Service, and the air quality network owners of the Cities of Johannesburg, Tshwane, Ekurhuleni, and Sasol for the air quality data.

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.


Research Funding:

This work was partially supported by the MAIA science team at the JPL, California Institute of Technology, led by D. Diner (Subcontract #1588347).

NS was supported by the NIEHS-funded HERCULES Center (P30ES019776).

RMG and MN were supported by a CSIR Parliamentary Grant. We thank the PI of the Pretoria_CSIR-DPSS site from AERONET for establishing and maintaining the site


  • Science & Technology
  • Life Sciences & Biomedicine
  • Technology
  • Environmental Sciences
  • Remote Sensing
  • Imaging Science & Photographic Technology
  • Environmental Sciences & Ecology
  • PM2.5
  • Random forest
  • Air quality standard
  • South Africa

A machine learning model to estimate ambient PM2.5 concentrations in industrialized highveld region of South Africa

Journal Title:



Volume 266


Type of Work:

Article | Post-print: After Peer Review


Exposure to fine particulate matter (PM2.5) has been linked to a substantial disease burden globally, yet little has been done to estimate the population health risks of PM2.5 in South Africa due to the lack of high-resolution PM2.5 exposure estimates. We developed a random forest model to estimate daily PM2.5 concentrations at 1 km2 resolution in and around industrialized Gauteng Province, South Africa, by combining satellite aerosol optical depth (AOD), meteorology, land use, and socioeconomic data. We then compared PM2.5 concentrations in the study domain before and after the implementation of the new national air quality standards. We aimed to test whether machine learning models are suitable for regions with sparse ground observations such as South Africa and which predictors played important roles in PM2.5 modeling. The cross-validation R2 and Root Mean Square Error of our model was 0.80 and 9.40 μg/m3, respectively. Satellite AOD, seasonal indicator, total precipitation, and population were among the most important predictors. Model-estimated PM2.5 levels successfully captured the temporal pattern recorded by ground observations. Spatially, the highest annual PM2.5 concentration appeared in central and northern Gauteng, including northern Johannesburg and the city of Tshwane. Since the 2016 changes in national PM2.5 standards, PM2.5 concentrations have decreased in most of our study region, although levels in Johannesburg and its surrounding areas have remained relatively constant. This is anadvanced PM2.5 model for South Africa with high prediction accuracy at the daily level and at a relatively high spatial resolution. Our study provided a reference for predictor selection, and our results can be used for a variety of purposes, including epidemiological research, burden of disease assessments, and policy evaluation.

Copyright information:

This is an Open Access work distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/).
Export to EndNote