A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring

Low-cost sensing strategies hold the promise of denser air quality monitoring networks, which could significantly improve our understanding of personal air pollution exposure. Additionally, low-cost air quality sensors could be deployed to areas where limited monitoring exists. However, low-cost...

Full description

Bibliographic Details
Main Authors:	N. Zimmerman, A. A. Presto, S. P. N. Kumar, J. Gu, A. Hauryliuk, E. S. Robinson, A. L. Robinson, R. Subramanian
Format:	Article
Language:	English
Published:	Copernicus Publications 2018-01-01
Series:	Atmospheric Measurement Techniques
Online Access:	https://www.atmos-meas-tech.net/11/291/2018/amt-11-291-2018.pdf

id	doaj-bd2d8e3a308b4e21bbac0265cb49aba6
record_format	Article
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	N. Zimmerman A. A. Presto S. P. N. Kumar J. Gu A. Hauryliuk E. S. Robinson A. L. Robinson R. Subramanian
spellingShingle	N. Zimmerman A. A. Presto S. P. N. Kumar J. Gu A. Hauryliuk E. S. Robinson A. L. Robinson R. Subramanian A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring Atmospheric Measurement Techniques
author_facet	N. Zimmerman A. A. Presto S. P. N. Kumar J. Gu A. Hauryliuk E. S. Robinson A. L. Robinson R. Subramanian
author_sort	N. Zimmerman
title	A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring
title_short	A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring
title_full	A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring
title_fullStr	A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring
title_full_unstemmed	A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring
title_sort	machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring
publisher	Copernicus Publications
series	Atmospheric Measurement Techniques
issn	1867-1381 1867-8548
publishDate	2018-01-01
description	Low-cost sensing strategies hold the promise of denser air quality monitoring networks, which could significantly improve our understanding of personal air pollution exposure. Additionally, low-cost air quality sensors could be deployed to areas where limited monitoring exists. However, low-cost sensors are frequently sensitive to environmental conditions and pollutant cross-sensitivities, which have historically been poorly addressed by laboratory calibrations, limiting their utility for monitoring. In this study, we investigated different calibration models for the Real-time Affordable Multi-Pollutant (RAMP) sensor package, which measures CO, NO<sub>2</sub>, O<sub>3</sub>, and CO<sub>2</sub>. We explored three methods: (1) laboratory univariate linear regression, (2) empirical multiple linear regression, and (3) machine-learning-based calibration models using random forests (RF). Calibration models were developed for 16–19 RAMP monitors (varied by pollutant) using training and testing windows spanning August 2016 through February 2017 in Pittsburgh, PA, US. The random forest models matched (CO) or significantly outperformed (NO<sub>2</sub>, CO<sub>2</sub>, O<sub>3</sub>) the other calibration models, and their accuracy and precision were robust over time for testing windows of up to 16 weeks. Following calibration, average mean absolute error on the testing data set from the random forest models was 38 ppb for CO (14 % relative error), 10 ppm for CO<sub>2</sub> (2 % relative error), 3.5 ppb for NO<sub>2</sub> (29 % relative error), and 3.4 ppb for O<sub>3</sub> (15 % relative error), and Pearson <i>r</i> versus the reference monitors exceeded 0.8 for most units. Model performance is explored in detail, including a quantification of model variable importance, accuracy across different concentration ranges, and performance in a range of monitoring contexts including the National Ambient Air Quality Standards (NAAQS) and the US EPA Air Sensors Guidebook recommendations of minimum data quality for personal exposure measurement. A key strength of the RF approach is that it accounts for pollutant cross-sensitivities. This highlights the importance of developing multipollutant sensor packages (as opposed to single-pollutant monitors); we determined this is especially critical for NO<sub>2</sub> and CO<sub>2</sub>. The evaluation reveals that only the RF-calibrated sensors meet the US EPA Air Sensors Guidebook recommendations of minimum data quality for personal exposure measurement. We also demonstrate that the RF-model-calibrated sensors could detect differences in NO<sub>2</sub> concentrations between a near-road site and a suburban site less than 1.5 km away. From this study, we conclude that combining RF models with carefully controlled state-of-the-art multipollutant sensor packages as in the RAMP monitors appears to be a very promising approach to address the poor performance that has plagued low-cost air quality sensors.
url	https://www.atmos-meas-tech.net/11/291/2018/amt-11-291-2018.pdf
work_keys_str_mv	AT nzimmerman amachinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT aapresto amachinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT spnkumar amachinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT jgu amachinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT ahauryliuk amachinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT esrobinson amachinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT alrobinson amachinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT rsubramanian amachinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT nzimmerman machinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT aapresto machinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT spnkumar machinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT jgu machinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT ahauryliuk machinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT esrobinson machinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT alrobinson machinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring AT rsubramanian machinelearningcalibrationmodelusingrandomforeststoimprovesensorperformanceforlowercostairqualitymonitoring
_version_	1725680607498862592
spelling	doaj-bd2d8e3a308b4e21bbac0265cb49aba62020-11-24T22:47:53ZengCopernicus PublicationsAtmospheric Measurement Techniques1867-13811867-85482018-01-011129131310.5194/amt-11-291-2018A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoringN. Zimmerman0A. A. Presto1S. P. N. Kumar2J. Gu3A. Hauryliuk4E. S. Robinson5A. L. Robinson6R. Subramanian7Center for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, PA 15213, USACenter for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, PA 15213, USACenter for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, PA 15213, USASensevere LLC, Pittsburgh, PA 15222, USACenter for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, PA 15213, USACenter for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, PA 15213, USACenter for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, PA 15213, USACenter for Atmospheric Particle Studies, Carnegie Mellon University, Pittsburgh, PA 15213, USALow-cost sensing strategies hold the promise of denser air quality monitoring networks, which could significantly improve our understanding of personal air pollution exposure. Additionally, low-cost air quality sensors could be deployed to areas where limited monitoring exists. However, low-cost sensors are frequently sensitive to environmental conditions and pollutant cross-sensitivities, which have historically been poorly addressed by laboratory calibrations, limiting their utility for monitoring. In this study, we investigated different calibration models for the Real-time Affordable Multi-Pollutant (RAMP) sensor package, which measures CO, NO<sub>2</sub>, O<sub>3</sub>, and CO<sub>2</sub>. We explored three methods: (1) laboratory univariate linear regression, (2) empirical multiple linear regression, and (3) machine-learning-based calibration models using random forests (RF). Calibration models were developed for 16–19 RAMP monitors (varied by pollutant) using training and testing windows spanning August 2016 through February 2017 in Pittsburgh, PA, US. The random forest models matched (CO) or significantly outperformed (NO<sub>2</sub>, CO<sub>2</sub>, O<sub>3</sub>) the other calibration models, and their accuracy and precision were robust over time for testing windows of up to 16 weeks. Following calibration, average mean absolute error on the testing data set from the random forest models was 38 ppb for CO (14 % relative error), 10 ppm for CO<sub>2</sub> (2 % relative error), 3.5 ppb for NO<sub>2</sub> (29 % relative error), and 3.4 ppb for O<sub>3</sub> (15 % relative error), and Pearson <i>r</i> versus the reference monitors exceeded 0.8 for most units. Model performance is explored in detail, including a quantification of model variable importance, accuracy across different concentration ranges, and performance in a range of monitoring contexts including the National Ambient Air Quality Standards (NAAQS) and the US EPA Air Sensors Guidebook recommendations of minimum data quality for personal exposure measurement. A key strength of the RF approach is that it accounts for pollutant cross-sensitivities. This highlights the importance of developing multipollutant sensor packages (as opposed to single-pollutant monitors); we determined this is especially critical for NO<sub>2</sub> and CO<sub>2</sub>. The evaluation reveals that only the RF-calibrated sensors meet the US EPA Air Sensors Guidebook recommendations of minimum data quality for personal exposure measurement. We also demonstrate that the RF-model-calibrated sensors could detect differences in NO<sub>2</sub> concentrations between a near-road site and a suburban site less than 1.5 km away. From this study, we conclude that combining RF models with carefully controlled state-of-the-art multipollutant sensor packages as in the RAMP monitors appears to be a very promising approach to address the poor performance that has plagued low-cost air quality sensors.https://www.atmos-meas-tech.net/11/291/2018/amt-11-291-2018.pdf

A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring

Similar Items