Machine learning calibration of low-cost NO2 and PM10 sensors: non-linear algorithms and their impact on site transferability

Low-cost air pollution sensors often fail to attain sufficient performance compared with state-of-the-art measurement stations, and they typically require expensive laboratory-based calibration procedures. A repeatedly proposed strategy to overcome these limitations is calibration through c...

Full description

Bibliographic Details
Main Authors:	P. Nowack, L. Konstantinovskiy, H. Gardiner, J. Cant
Format:	Article
Language:	English
Published:	Copernicus Publications 2021-08-01
Series:	Atmospheric Measurement Techniques
Online Access:	https://amt.copernicus.org/articles/14/5637/2021/amt-14-5637-2021.pdf

id	doaj-6fbd647cf5114557a082ddc7ac2aca27
record_format	Article
spelling	doaj-6fbd647cf5114557a082ddc7ac2aca272021-08-18T13:03:20ZengCopernicus PublicationsAtmospheric Measurement Techniques1867-13811867-85482021-08-01145637565510.5194/amt-14-5637-2021Machine learning calibration of low-cost NO<sub>2</sub> and PM<sub>10</sub> sensors: non-linear algorithms and their impact on site transferabilityP. Nowack0P. Nowack1P. Nowack2P. Nowack3L. Konstantinovskiy4H. Gardiner5J. Cant6Grantham Institute – Climate Change and the Environment, Imperial College London, London SW7 2AZ, UKDepartment of Physics, Imperial College London, London SW7 2AZ, UKData Science Institute, Imperial College London, London SW7 2AZ, UKClimatic Research Unit, School of Environmental Sciences, University of East Anglia, Norwich NR4 7TJ, UKAirPublic Ltd, London, UKAirPublic Ltd, London, UKAirPublic Ltd, London, UK<p>Low-cost air pollution sensors often fail to attain sufficient performance compared with state-of-the-art measurement stations, and they typically require expensive laboratory-based calibration procedures. A repeatedly proposed strategy to overcome these limitations is calibration through co-location with public measurement stations. Here we test the idea of using machine learning algorithms for such calibration tasks using hourly-averaged co-location data for nitrogen dioxide (NO<span class="inline-formula"><sub>2</sub></span>) and particulate matter of particle sizes smaller than 10 <span class="inline-formula">µm</span> (PM<span class="inline-formula"><sub>10</sub></span>) at three different locations in the urban area of London, UK. We compare the performance of ridge regression, a linear statistical learning algorithm, to two non-linear algorithms in the form of random forest regression (RFR) and Gaussian process regression (GPR). We further benchmark the performance of all three machine learning methods relative to the more common multiple linear regression (MLR). We obtain very good out-of-sample <span class="inline-formula"><i>R</i><sup>2</sup></span> scores (coefficient of determination) <span class="inline-formula">>0.7</span>, frequently exceeding 0.8, for the machine learning calibrated low-cost sensors. In contrast, the performance of MLR is more dependent on random variations in the sensor hardware and co-located signals, and it is also more sensitive to the length of the co-location period. We find that, subject to certain conditions, GPR is typically the best-performing method in our calibration setting, followed by ridge regression and RFR. We also highlight several key limitations of the machine learning methods, which will be crucial to consider in any co-location calibration. In particular, all methods are fundamentally limited in how well they can reproduce pollution levels that lie outside those encountered at training stage. We find, however, that the linear ridge regression outperforms the non-linear methods in extrapolation settings. GPR can allow for a small degree of extrapolation, whereas RFR can only predict values within the training range. This algorithm-dependent ability to extrapolate is one of the key limiting factors when the calibrated sensors are deployed away from the co-location site itself. Consequently, we find that ridge regression is often performing as good as or even better than GPR after sensor relocation. Our results highlight the potential of co-location approaches paired with machine learning calibration techniques to reduce costs of air pollution measurements, subject to careful consideration of the co-location training conditions, the choice of calibration variables and the features of the calibration algorithm.</p>https://amt.copernicus.org/articles/14/5637/2021/amt-14-5637-2021.pdf
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	P. Nowack P. Nowack P. Nowack P. Nowack L. Konstantinovskiy H. Gardiner J. Cant
spellingShingle	P. Nowack P. Nowack P. Nowack P. Nowack L. Konstantinovskiy H. Gardiner J. Cant Machine learning calibration of low-cost NO<sub>2</sub> and PM<sub>10</sub> sensors: non-linear algorithms and their impact on site transferability Atmospheric Measurement Techniques
author_facet	P. Nowack P. Nowack P. Nowack P. Nowack L. Konstantinovskiy H. Gardiner J. Cant
author_sort	P. Nowack
title	Machine learning calibration of low-cost NO<sub>2</sub> and PM<sub>10</sub> sensors: non-linear algorithms and their impact on site transferability
title_short	Machine learning calibration of low-cost NO<sub>2</sub> and PM<sub>10</sub> sensors: non-linear algorithms and their impact on site transferability
title_full	Machine learning calibration of low-cost NO<sub>2</sub> and PM<sub>10</sub> sensors: non-linear algorithms and their impact on site transferability
title_fullStr	Machine learning calibration of low-cost NO<sub>2</sub> and PM<sub>10</sub> sensors: non-linear algorithms and their impact on site transferability
title_full_unstemmed	Machine learning calibration of low-cost NO<sub>2</sub> and PM<sub>10</sub> sensors: non-linear algorithms and their impact on site transferability
title_sort	machine learning calibration of low-cost no<sub>2</sub> and pm<sub>10</sub> sensors: non-linear algorithms and their impact on site transferability
publisher	Copernicus Publications
series	Atmospheric Measurement Techniques
issn	1867-1381 1867-8548
publishDate	2021-08-01
description	<p>Low-cost air pollution sensors often fail to attain sufficient performance compared with state-of-the-art measurement stations, and they typically require expensive laboratory-based calibration procedures. A repeatedly proposed strategy to overcome these limitations is calibration through co-location with public measurement stations. Here we test the idea of using machine learning algorithms for such calibration tasks using hourly-averaged co-location data for nitrogen dioxide (NO<span class="inline-formula"><sub>2</sub></span>) and particulate matter of particle sizes smaller than 10 <span class="inline-formula">µm</span> (PM<span class="inline-formula"><sub>10</sub></span>) at three different locations in the urban area of London, UK. We compare the performance of ridge regression, a linear statistical learning algorithm, to two non-linear algorithms in the form of random forest regression (RFR) and Gaussian process regression (GPR). We further benchmark the performance of all three machine learning methods relative to the more common multiple linear regression (MLR). We obtain very good out-of-sample <span class="inline-formula"><i>R</i><sup>2</sup></span> scores (coefficient of determination) <span class="inline-formula">>0.7</span>, frequently exceeding 0.8, for the machine learning calibrated low-cost sensors. In contrast, the performance of MLR is more dependent on random variations in the sensor hardware and co-located signals, and it is also more sensitive to the length of the co-location period. We find that, subject to certain conditions, GPR is typically the best-performing method in our calibration setting, followed by ridge regression and RFR. We also highlight several key limitations of the machine learning methods, which will be crucial to consider in any co-location calibration. In particular, all methods are fundamentally limited in how well they can reproduce pollution levels that lie outside those encountered at training stage. We find, however, that the linear ridge regression outperforms the non-linear methods in extrapolation settings. GPR can allow for a small degree of extrapolation, whereas RFR can only predict values within the training range. This algorithm-dependent ability to extrapolate is one of the key limiting factors when the calibrated sensors are deployed away from the co-location site itself. Consequently, we find that ridge regression is often performing as good as or even better than GPR after sensor relocation. Our results highlight the potential of co-location approaches paired with machine learning calibration techniques to reduce costs of air pollution measurements, subject to careful consideration of the co-location training conditions, the choice of calibration variables and the features of the calibration algorithm.</p>
url	https://amt.copernicus.org/articles/14/5637/2021/amt-14-5637-2021.pdf
work_keys_str_mv	AT pnowack machinelearningcalibrationoflowcostnosub2subandpmsub10subsensorsnonlinearalgorithmsandtheirimpactonsitetransferability AT pnowack machinelearningcalibrationoflowcostnosub2subandpmsub10subsensorsnonlinearalgorithmsandtheirimpactonsitetransferability AT pnowack machinelearningcalibrationoflowcostnosub2subandpmsub10subsensorsnonlinearalgorithmsandtheirimpactonsitetransferability AT pnowack machinelearningcalibrationoflowcostnosub2subandpmsub10subsensorsnonlinearalgorithmsandtheirimpactonsitetransferability AT lkonstantinovskiy machinelearningcalibrationoflowcostnosub2subandpmsub10subsensorsnonlinearalgorithmsandtheirimpactonsitetransferability AT hgardiner machinelearningcalibrationoflowcostnosub2subandpmsub10subsensorsnonlinearalgorithmsandtheirimpactonsitetransferability AT jcant machinelearningcalibrationoflowcostnosub2subandpmsub10subsensorsnonlinearalgorithmsandtheirimpactonsitetransferability
_version_	1721202757740789760

Machine learning calibration of low-cost NO<sub>2</sub> and PM<sub>10</sub> sensors: non-linear algorithms and their impact on site transferability

Similar Items