Data imputation in in situ-measured particle size distributions by means of neural networks

<p><span id="page5536"/>In air quality research, often only size-integrated particle mass concentrations as indicators of aerosol particles are considered. However, the mass concentrations do not provide sufficient information to convey the full story of fractionated size distr...

Full description

Bibliographic Details
Main Authors: P. L. Fung, M. A. Zaidan, O. Surakhi, S. Tarkoma, T. Petäjä, T. Hussein
Format: Article
Language:English
Published: Copernicus Publications 2021-08-01
Series:Atmospheric Measurement Techniques
Online Access:https://amt.copernicus.org/articles/14/5535/2021/amt-14-5535-2021.pdf
id doaj-286050e6405341048cd49d7ec0eda248
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author P. L. Fung
P. L. Fung
M. A. Zaidan
M. A. Zaidan
M. A. Zaidan
O. Surakhi
S. Tarkoma
T. Petäjä
T. Petäjä
T. Hussein
T. Hussein
spellingShingle P. L. Fung
P. L. Fung
M. A. Zaidan
M. A. Zaidan
M. A. Zaidan
O. Surakhi
S. Tarkoma
T. Petäjä
T. Petäjä
T. Hussein
T. Hussein
Data imputation in in situ-measured particle size distributions by means of neural networks
Atmospheric Measurement Techniques
author_facet P. L. Fung
P. L. Fung
M. A. Zaidan
M. A. Zaidan
M. A. Zaidan
O. Surakhi
S. Tarkoma
T. Petäjä
T. Petäjä
T. Hussein
T. Hussein
author_sort P. L. Fung
title Data imputation in in situ-measured particle size distributions by means of neural networks
title_short Data imputation in in situ-measured particle size distributions by means of neural networks
title_full Data imputation in in situ-measured particle size distributions by means of neural networks
title_fullStr Data imputation in in situ-measured particle size distributions by means of neural networks
title_full_unstemmed Data imputation in in situ-measured particle size distributions by means of neural networks
title_sort data imputation in in situ-measured particle size distributions by means of neural networks
publisher Copernicus Publications
series Atmospheric Measurement Techniques
issn 1867-1381
1867-8548
publishDate 2021-08-01
description <p><span id="page5536"/>In air quality research, often only size-integrated particle mass concentrations as indicators of aerosol particles are considered. However, the mass concentrations do not provide sufficient information to convey the full story of fractionated size distribution, in which the particles of different diameters (<span class="inline-formula"><i>D</i><sub>p</sub></span>) are able to deposit differently on respiratory system and cause various harm. Aerosol size distribution measurements rely on a variety of techniques to classify the aerosol size and measure the size distribution. From the raw data the ambient size distribution is determined utilising a suite of inversion algorithms. However, the inversion problem is quite often ill-posed and challenging to solve. Due to the instrumental insufficiency and inversion limitations, imputation methods for fractionated particle size distribution are of great significance to fill the missing gaps or negative values. The study at hand involves a merged particle size distribution, from a scanning mobility particle sizer (NanoSMPS) and an optical particle sizer (OPS) covering the aerosol size distributions from 0.01 to 0.42 <span class="inline-formula">µm</span> (electrical mobility equivalent size) and 0.3 to 10 <span class="inline-formula">µm</span> (optical equivalent size) and meteorological parameters collected at an urban background region in Amman, Jordan, in the period of 1 August 2016–31 July 2017. We develop and evaluate feed-forward neural network (FFNN) approaches to estimate number concentrations at particular size bin with (1) meteorological parameters, (2) number concentration at other size bins and (3) both of the above as input variables. Two layers with 10–15 neurons are found to be the optimal option. Worse performance is observed at the lower edge (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M4" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">0.01</mn><mo>&lt;</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo>&lt;</mo><mn mathvariant="normal">0.02</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="83pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="0a19d858f772fceabb7358981286c2af"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00001.svg" width="83pt" height="14pt" src="amt-14-5535-2021-ie00001.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>), the mid-range region (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M6" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">0.15</mn><mo>&lt;</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo>&lt;</mo><mn mathvariant="normal">0.5</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="77pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="4c842197885cdc4dc6daf89731acd19e"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00002.svg" width="77pt" height="14pt" src="amt-14-5535-2021-ie00002.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>) and the upper edge (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M8" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">6</mn><mo>&lt;</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo>&lt;</mo><mn mathvariant="normal">10</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="58pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="a7eb375b62532835fdc23b1be88eb5ad"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00003.svg" width="58pt" height="14pt" src="amt-14-5535-2021-ie00003.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>). For the edges at both ends, the number of neighbouring size bins is limited, and the detection efficiency by the corresponding instruments is lower compared to the other size bins. A distinct performance drop over the overlapping mid-range region is due to the deficiency of a merging algorithm. Another plausible reason for the poorer performance for finer particles is that they are more effectively removed from the atmosphere compared to the coarser particles so that the relationships between the input variables and the small particles are more dynamic. An observable overestimation is also found in the early morning for ultrafine particles followed by a distinct underestimation before midday. In the winter, due to a possible sensor drift and interference artefacts, the estimation performance is not as good as the other seasons. The FFNN approach by meteorological parameters using 5 min data (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.22–0.58) shows poorer results than data with longer time resolution (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.66–0.77). The FFNN approach using the number concentration at the other size bins can serve as an alternative way to replace negative numbers in the size distribution raw dataset thanks to its high accuracy and reliability (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.97–1). This negative-number filling approach can maintain a symmetric distribution of errors and complement the existing ill-posed built-in algorithm in particle sizer instruments.</p>
url https://amt.copernicus.org/articles/14/5535/2021/amt-14-5535-2021.pdf
work_keys_str_mv AT plfung dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
AT plfung dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
AT mazaidan dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
AT mazaidan dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
AT mazaidan dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
AT osurakhi dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
AT starkoma dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
AT tpetaja dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
AT tpetaja dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
AT thussein dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
AT thussein dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
_version_ 1721208405716107264
spelling doaj-286050e6405341048cd49d7ec0eda2482021-08-13T12:15:16ZengCopernicus PublicationsAtmospheric Measurement Techniques1867-13811867-85482021-08-01145535555410.5194/amt-14-5535-2021Data imputation in in situ-measured particle size distributions by means of neural networksP. L. Fung0P. L. Fung1M. A. Zaidan2M. A. Zaidan3M. A. Zaidan4O. Surakhi5S. Tarkoma6T. Petäjä7T. Petäjä8T. Hussein9T. Hussein10Institute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandHelsinki Institute of Sustainability Science, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandInstitute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandHelsinki Institute of Sustainability Science, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandJoint International Research Laboratory of Atmospheric and Earth System Sciences, School of Atmospheric Sciences, Nanjing University, Nanjing 210023, ChinaDepartment of Computer Science, The University of Jordan, Amman 11942, JordanDepartment of Computer Science, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandInstitute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandJoint International Research Laboratory of Atmospheric and Earth System Sciences, School of Atmospheric Sciences, Nanjing University, Nanjing 210023, ChinaInstitute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandDepartment of Physics, The University of Jordan, Amman 11942, Jordan<p><span id="page5536"/>In air quality research, often only size-integrated particle mass concentrations as indicators of aerosol particles are considered. However, the mass concentrations do not provide sufficient information to convey the full story of fractionated size distribution, in which the particles of different diameters (<span class="inline-formula"><i>D</i><sub>p</sub></span>) are able to deposit differently on respiratory system and cause various harm. Aerosol size distribution measurements rely on a variety of techniques to classify the aerosol size and measure the size distribution. From the raw data the ambient size distribution is determined utilising a suite of inversion algorithms. However, the inversion problem is quite often ill-posed and challenging to solve. Due to the instrumental insufficiency and inversion limitations, imputation methods for fractionated particle size distribution are of great significance to fill the missing gaps or negative values. The study at hand involves a merged particle size distribution, from a scanning mobility particle sizer (NanoSMPS) and an optical particle sizer (OPS) covering the aerosol size distributions from 0.01 to 0.42 <span class="inline-formula">µm</span> (electrical mobility equivalent size) and 0.3 to 10 <span class="inline-formula">µm</span> (optical equivalent size) and meteorological parameters collected at an urban background region in Amman, Jordan, in the period of 1 August 2016–31 July 2017. We develop and evaluate feed-forward neural network (FFNN) approaches to estimate number concentrations at particular size bin with (1) meteorological parameters, (2) number concentration at other size bins and (3) both of the above as input variables. Two layers with 10–15 neurons are found to be the optimal option. Worse performance is observed at the lower edge (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M4" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">0.01</mn><mo>&lt;</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo>&lt;</mo><mn mathvariant="normal">0.02</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="83pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="0a19d858f772fceabb7358981286c2af"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00001.svg" width="83pt" height="14pt" src="amt-14-5535-2021-ie00001.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>), the mid-range region (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M6" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">0.15</mn><mo>&lt;</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo>&lt;</mo><mn mathvariant="normal">0.5</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="77pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="4c842197885cdc4dc6daf89731acd19e"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00002.svg" width="77pt" height="14pt" src="amt-14-5535-2021-ie00002.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>) and the upper edge (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M8" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">6</mn><mo>&lt;</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo>&lt;</mo><mn mathvariant="normal">10</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="58pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="a7eb375b62532835fdc23b1be88eb5ad"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00003.svg" width="58pt" height="14pt" src="amt-14-5535-2021-ie00003.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>). For the edges at both ends, the number of neighbouring size bins is limited, and the detection efficiency by the corresponding instruments is lower compared to the other size bins. A distinct performance drop over the overlapping mid-range region is due to the deficiency of a merging algorithm. Another plausible reason for the poorer performance for finer particles is that they are more effectively removed from the atmosphere compared to the coarser particles so that the relationships between the input variables and the small particles are more dynamic. An observable overestimation is also found in the early morning for ultrafine particles followed by a distinct underestimation before midday. In the winter, due to a possible sensor drift and interference artefacts, the estimation performance is not as good as the other seasons. The FFNN approach by meteorological parameters using 5 min data (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.22–0.58) shows poorer results than data with longer time resolution (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.66–0.77). The FFNN approach using the number concentration at the other size bins can serve as an alternative way to replace negative numbers in the size distribution raw dataset thanks to its high accuracy and reliability (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.97–1). This negative-number filling approach can maintain a symmetric distribution of errors and complement the existing ill-posed built-in algorithm in particle sizer instruments.</p>https://amt.copernicus.org/articles/14/5535/2021/amt-14-5535-2021.pdf