Lag Variables in Nitrogen Oxide Concentration Modelling: A Case Study in Wrocław, Poland

Due to the unwavering interest of both residents and authorities in the air quality of urban agglomerations, we pose the following question in this paper: What impact do current and past meteorological factors and traffic flow intensity have on air quality? What is the impact of lagged variables on...

Full description

Bibliographic Details
Main Authors: Joanna A. Kamińska, Fernando Jiménez, Estrella Lucena-Sánchez, Guido Sciavicco, Tomasz Turek
Format: Article
Language:English
Published: MDPI AG 2020-11-01
Series:Atmosphere
Subjects:
Online Access:https://www.mdpi.com/2073-4433/11/12/1293
id doaj-24a9bf61b1bf48d58f9604a0f9f18597
record_format Article
spelling doaj-24a9bf61b1bf48d58f9604a0f9f185972020-12-01T00:01:22ZengMDPI AGAtmosphere2073-44332020-11-01111293129310.3390/atmos11121293Lag Variables in Nitrogen Oxide Concentration Modelling: A Case Study in Wrocław, PolandJoanna A. Kamińska0Fernando Jiménez1Estrella Lucena-Sánchez2Guido Sciavicco3Tomasz Turek4Department of Mathematics, Wroclaw University of Environmental and Life Sciences, 50-375 Wrocław, PolandDepartment of Information and Communication Engineering, University of Murcia, 30100 Murcia, SpainDepartment of Mathematics and Computer Science, University of Ferrara, 44121 Ferrara, ItalyDepartment of Mathematics and Computer Science, University of Ferrara, 44121 Ferrara, ItalyDepartment of Mathematics, Wroclaw University of Environmental and Life Sciences, 50-375 Wrocław, PolandDue to the unwavering interest of both residents and authorities in the air quality of urban agglomerations, we pose the following question in this paper: What impact do current and past meteorological factors and traffic flow intensity have on air quality? What is the impact of lagged variables on the fit of an explanation model, and how do they affect its ability to predict? We focused on NO<sub>2</sub> and NO<sub>x</sub> concentrations, and conducted this research using hourly data from the city of Wrocław (western Poland) from 2015 to 2017; we used multi-objective optimization to determine the optimal delays. It turned out that for both NO<sub>2</sub> and NO<sub>x</sub>, the past values for traffic flow, wind speed, and sunshine duration are more important than the current ones. We built random forest models on each of the pollutants for both the current and past values and discovered that including a lagged variable increases the resulting R<sup>2</sup> from 0.51 to 0.56 for NO<sub>2</sub> and from 0.46 to 0.52 for NO<sub>x</sub>. We also analyzed the feature importance in each model, and found that for NO<sub>2</sub>, a wind speed delay of more than three hours causes a significant decrease, while the importance of relative humidity increases with a seven-hour delay; likewise, wind speed increases the importance for NO<sub>x</sub> prediction with a two-hour delay. We concluded that, in pollutant concentration modeling, the possibility of a delayed effect of the independent variables should always be considered, because it can significantly increase the performance of the model and suggest unexpected relationships or dependencies.https://www.mdpi.com/2073-4433/11/12/1293air pollutionnitrogen oxidesrandom forestlag variablesmulti-objective optimizationtraffic flow
collection DOAJ
language English
format Article
sources DOAJ
author Joanna A. Kamińska
Fernando Jiménez
Estrella Lucena-Sánchez
Guido Sciavicco
Tomasz Turek
spellingShingle Joanna A. Kamińska
Fernando Jiménez
Estrella Lucena-Sánchez
Guido Sciavicco
Tomasz Turek
Lag Variables in Nitrogen Oxide Concentration Modelling: A Case Study in Wrocław, Poland
Atmosphere
air pollution
nitrogen oxides
random forest
lag variables
multi-objective optimization
traffic flow
author_facet Joanna A. Kamińska
Fernando Jiménez
Estrella Lucena-Sánchez
Guido Sciavicco
Tomasz Turek
author_sort Joanna A. Kamińska
title Lag Variables in Nitrogen Oxide Concentration Modelling: A Case Study in Wrocław, Poland
title_short Lag Variables in Nitrogen Oxide Concentration Modelling: A Case Study in Wrocław, Poland
title_full Lag Variables in Nitrogen Oxide Concentration Modelling: A Case Study in Wrocław, Poland
title_fullStr Lag Variables in Nitrogen Oxide Concentration Modelling: A Case Study in Wrocław, Poland
title_full_unstemmed Lag Variables in Nitrogen Oxide Concentration Modelling: A Case Study in Wrocław, Poland
title_sort lag variables in nitrogen oxide concentration modelling: a case study in wrocław, poland
publisher MDPI AG
series Atmosphere
issn 2073-4433
publishDate 2020-11-01
description Due to the unwavering interest of both residents and authorities in the air quality of urban agglomerations, we pose the following question in this paper: What impact do current and past meteorological factors and traffic flow intensity have on air quality? What is the impact of lagged variables on the fit of an explanation model, and how do they affect its ability to predict? We focused on NO<sub>2</sub> and NO<sub>x</sub> concentrations, and conducted this research using hourly data from the city of Wrocław (western Poland) from 2015 to 2017; we used multi-objective optimization to determine the optimal delays. It turned out that for both NO<sub>2</sub> and NO<sub>x</sub>, the past values for traffic flow, wind speed, and sunshine duration are more important than the current ones. We built random forest models on each of the pollutants for both the current and past values and discovered that including a lagged variable increases the resulting R<sup>2</sup> from 0.51 to 0.56 for NO<sub>2</sub> and from 0.46 to 0.52 for NO<sub>x</sub>. We also analyzed the feature importance in each model, and found that for NO<sub>2</sub>, a wind speed delay of more than three hours causes a significant decrease, while the importance of relative humidity increases with a seven-hour delay; likewise, wind speed increases the importance for NO<sub>x</sub> prediction with a two-hour delay. We concluded that, in pollutant concentration modeling, the possibility of a delayed effect of the independent variables should always be considered, because it can significantly increase the performance of the model and suggest unexpected relationships or dependencies.
topic air pollution
nitrogen oxides
random forest
lag variables
multi-objective optimization
traffic flow
url https://www.mdpi.com/2073-4433/11/12/1293
work_keys_str_mv AT joannaakaminska lagvariablesinnitrogenoxideconcentrationmodellingacasestudyinwrocławpoland
AT fernandojimenez lagvariablesinnitrogenoxideconcentrationmodellingacasestudyinwrocławpoland
AT estrellalucenasanchez lagvariablesinnitrogenoxideconcentrationmodellingacasestudyinwrocławpoland
AT guidosciavicco lagvariablesinnitrogenoxideconcentrationmodellingacasestudyinwrocławpoland
AT tomaszturek lagvariablesinnitrogenoxideconcentrationmodellingacasestudyinwrocławpoland
_version_ 1724411384033181696