A machine learning calibration model using random forests to improve sensor performance for lower-cost air quality monitoring
Low-cost sensing strategies hold the promise of denser air quality monitoring networks, which could significantly improve our understanding of personal air pollution exposure. Additionally, low-cost air quality sensors could be deployed to areas where limited monitoring exists. However, low-cost...
Main Authors: | , , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Copernicus Publications
2018-01-01
|
Series: | Atmospheric Measurement Techniques |
Online Access: | https://www.atmos-meas-tech.net/11/291/2018/amt-11-291-2018.pdf |
Summary: | Low-cost sensing strategies hold the promise of denser air quality monitoring
networks, which could significantly improve our understanding of personal air
pollution exposure. Additionally, low-cost air quality sensors could be
deployed to areas where limited monitoring exists. However, low-cost sensors
are frequently sensitive to environmental conditions and pollutant
cross-sensitivities, which have historically been poorly addressed by
laboratory calibrations, limiting their utility for monitoring. In this
study, we investigated different calibration models for the Real-time
Affordable Multi-Pollutant (RAMP) sensor package, which measures CO,
NO<sub>2</sub>, O<sub>3</sub>, and CO<sub>2</sub>. We explored three methods: (1) laboratory
univariate linear regression, (2) empirical multiple linear regression, and
(3) machine-learning-based calibration models using random forests (RF).
Calibration models were developed for 16–19 RAMP monitors (varied by
pollutant) using training and testing windows spanning August 2016 through
February 2017 in Pittsburgh, PA, US. The random forest models matched (CO) or
significantly outperformed (NO<sub>2</sub>, CO<sub>2</sub>, O<sub>3</sub>) the other
calibration models, and their accuracy and precision were robust over time
for testing windows of up to 16 weeks. Following calibration, average mean
absolute error on the testing data set from the random forest models was
38 ppb for CO (14 % relative error), 10 ppm for CO<sub>2</sub> (2 %
relative error), 3.5 ppb for NO<sub>2</sub> (29 % relative error), and
3.4 ppb for O<sub>3</sub> (15 % relative error), and Pearson <i>r</i> versus the
reference monitors exceeded 0.8 for most units. Model performance is explored
in detail, including a quantification of model variable importance, accuracy
across different concentration ranges, and performance in a range of
monitoring contexts including the National Ambient Air Quality Standards
(NAAQS) and the US EPA Air Sensors Guidebook recommendations of minimum data
quality for personal exposure measurement. A key strength of the RF approach
is that it accounts for pollutant cross-sensitivities. This highlights the
importance of developing multipollutant sensor packages (as opposed to
single-pollutant monitors); we determined this is especially critical for
NO<sub>2</sub> and CO<sub>2</sub>. The evaluation reveals that only the RF-calibrated
sensors meet the US EPA Air Sensors Guidebook recommendations of minimum data
quality for personal exposure measurement. We also demonstrate that the
RF-model-calibrated sensors could detect differences in NO<sub>2</sub>
concentrations between a near-road site and a suburban site less than 1.5 km
away. From this study, we conclude that combining RF models with carefully
controlled state-of-the-art multipollutant sensor packages as in the RAMP
monitors appears to be a very promising approach to address the poor
performance that has plagued low-cost air quality sensors. |
---|---|
ISSN: | 1867-1381 1867-8548 |