Error Prediction of Air Quality at Monitoring Stations Using Random Forest in a Total Error Framework

Instead of a flag valid/non-valid usually proposed in the quality control (QC) processes of air quality (AQ), we proposed a method that predicts the <i>p</i>-value of each observation as a value between 0 and 1. We based our error predictions on three approaches: the one proposed by the...

Full description

Bibliographic Details
Main Authors:	Jean-Marie Lepioufle, Leif Marsteen, Mona Johnsrud
Format:	Article
Language:	English
Published:	MDPI AG 2021-03-01
Series:	Sensors
Subjects:	air quality quality control Random Forest error prediction total error framework
Online Access:	https://www.mdpi.com/1424-8220/21/6/2160

Description
Summary:	Instead of a flag valid/non-valid usually proposed in the quality control (QC) processes of air quality (AQ), we proposed a method that predicts the <i>p</i>-value of each observation as a value between 0 and 1. We based our error predictions on three approaches: the one proposed by the Working Group on Guidance for the Demonstration of Equivalence (European Commission (2010)), the one proposed by Wager (Journal of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>M</mi><mi>a</mi><mi>c</mi><mi>h</mi><mi>i</mi><mi>n</mi><mi>e</mi><mi>L</mi><mi>e</mi><mi>a</mi><mi>r</mi><mi>n</mi><mi>i</mi><mi>n</mi><mi>g</mi><mi>R</mi><mi>e</mi><mi>s</mi><mi>e</mi><mi>a</mi><mi>r</mi><mi>c</mi><mi>h</mi></mrow></semantics></math></inline-formula>, 15, 1625–1651 (2014)) and the one proposed by Lu (Journal of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi>M</mi><mi>a</mi><mi>c</mi><mi>h</mi><mi>i</mi><mi>n</mi><mi>e</mi><mi>L</mi><mi>e</mi><mi>a</mi><mi>r</mi><mi>n</mi><mi>i</mi><mi>n</mi><mi>g</mi><mi>R</mi><mi>e</mi><mi>s</mi><mi>e</mi><mi>a</mi><mi>r</mi><mi>c</mi><mi>h</mi></mrow></semantics></math></inline-formula>, 22, 1–41 (2021)). Total Error framework enables to differentiate the different errors: input, output, structural modeling and remnant. We thus theoretically described a one-site AQ prediction based on a multi-site network using Random Forest for regression in a Total Error framework. We demonstrated the methodology with a dataset of hourly nitrogen dioxide measured by a network of monitoring stations located in Oslo, Norway and implemented the error predictions for the three approaches. The results indicate that a simple one-site AQ prediction based on a multi-site network using Random Forest for regression provides moderate metrics for fixed stations. According to the diagnostic based on predictive qq-plot and among the three approaches used in this study, the approach proposed by Lu provides better error predictions. Furthermore, ensuring a high precision of the error prediction requires efforts on getting accurate input, output and prediction model and limiting our lack of knowledge about the “true” AQ phenomena. We put effort in quantifying each type of error involved in the error prediction to assess the error prediction model and further improving it in terms of performance and precision.
ISSN:	1424-8220

Error Prediction of Air Quality at Monitoring Stations Using Random Forest in a Total Error Framework

Similar Items