Data imputation in in situ-measured particle size distributions by means of neural networks

<p><span id="page5536"/>In air quality research, often only size-integrated particle mass concentrations as indicators of aerosol particles are considered. However, the mass concentrations do not provide sufficient information to convey the full story of fractionated size distr...

Full description

Bibliographic Details
Main Authors:	P. L. Fung, M. A. Zaidan, O. Surakhi, S. Tarkoma, T. Petäjä, T. Hussein
Format:	Article
Language:	English
Published:	Copernicus Publications 2021-08-01
Series:	Atmospheric Measurement Techniques
Online Access:	https://amt.copernicus.org/articles/14/5535/2021/amt-14-5535-2021.pdf

id	doaj-286050e6405341048cd49d7ec0eda248
record_format	Article
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	P. L. Fung P. L. Fung M. A. Zaidan M. A. Zaidan M. A. Zaidan O. Surakhi S. Tarkoma T. Petäjä T. Petäjä T. Hussein T. Hussein
spellingShingle	P. L. Fung P. L. Fung M. A. Zaidan M. A. Zaidan M. A. Zaidan O. Surakhi S. Tarkoma T. Petäjä T. Petäjä T. Hussein T. Hussein Data imputation in in situ-measured particle size distributions by means of neural networks Atmospheric Measurement Techniques
author_facet	P. L. Fung P. L. Fung M. A. Zaidan M. A. Zaidan M. A. Zaidan O. Surakhi S. Tarkoma T. Petäjä T. Petäjä T. Hussein T. Hussein
author_sort	P. L. Fung
title	Data imputation in in situ-measured particle size distributions by means of neural networks
title_short	Data imputation in in situ-measured particle size distributions by means of neural networks
title_full	Data imputation in in situ-measured particle size distributions by means of neural networks
title_fullStr	Data imputation in in situ-measured particle size distributions by means of neural networks
title_full_unstemmed	Data imputation in in situ-measured particle size distributions by means of neural networks
title_sort	data imputation in in situ-measured particle size distributions by means of neural networks
publisher	Copernicus Publications
series	Atmospheric Measurement Techniques
issn	1867-1381 1867-8548
publishDate	2021-08-01
description	<p><span id="page5536"/>In air quality research, often only size-integrated particle mass concentrations as indicators of aerosol particles are considered. However, the mass concentrations do not provide sufficient information to convey the full story of fractionated size distribution, in which the particles of different diameters (<span class="inline-formula"><i>D</i><sub>p</sub></span>) are able to deposit differently on respiratory system and cause various harm. Aerosol size distribution measurements rely on a variety of techniques to classify the aerosol size and measure the size distribution. From the raw data the ambient size distribution is determined utilising a suite of inversion algorithms. However, the inversion problem is quite often ill-posed and challenging to solve. Due to the instrumental insufficiency and inversion limitations, imputation methods for fractionated particle size distribution are of great significance to fill the missing gaps or negative values. The study at hand involves a merged particle size distribution, from a scanning mobility particle sizer (NanoSMPS) and an optical particle sizer (OPS) covering the aerosol size distributions from 0.01 to 0.42 <span class="inline-formula">µm</span> (electrical mobility equivalent size) and 0.3 to 10 <span class="inline-formula">µm</span> (optical equivalent size) and meteorological parameters collected at an urban background region in Amman, Jordan, in the period of 1 August 2016–31 July 2017. We develop and evaluate feed-forward neural network (FFNN) approaches to estimate number concentrations at particular size bin with (1) meteorological parameters, (2) number concentration at other size bins and (3) both of the above as input variables. Two layers with 10–15 neurons are found to be the optimal option. Worse performance is observed at the lower edge (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M4" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">0.01</mn><mo><</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo><</mo><mn mathvariant="normal">0.02</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="83pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="0a19d858f772fceabb7358981286c2af"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00001.svg" width="83pt" height="14pt" src="amt-14-5535-2021-ie00001.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>), the mid-range region (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M6" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">0.15</mn><mo><</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo><</mo><mn mathvariant="normal">0.5</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="77pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="4c842197885cdc4dc6daf89731acd19e"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00002.svg" width="77pt" height="14pt" src="amt-14-5535-2021-ie00002.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>) and the upper edge (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M8" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">6</mn><mo><</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo><</mo><mn mathvariant="normal">10</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="58pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="a7eb375b62532835fdc23b1be88eb5ad"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00003.svg" width="58pt" height="14pt" src="amt-14-5535-2021-ie00003.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>). For the edges at both ends, the number of neighbouring size bins is limited, and the detection efficiency by the corresponding instruments is lower compared to the other size bins. A distinct performance drop over the overlapping mid-range region is due to the deficiency of a merging algorithm. Another plausible reason for the poorer performance for finer particles is that they are more effectively removed from the atmosphere compared to the coarser particles so that the relationships between the input variables and the small particles are more dynamic. An observable overestimation is also found in the early morning for ultrafine particles followed by a distinct underestimation before midday. In the winter, due to a possible sensor drift and interference artefacts, the estimation performance is not as good as the other seasons. The FFNN approach by meteorological parameters using 5 min data (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.22–0.58) shows poorer results than data with longer time resolution (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.66–0.77). The FFNN approach using the number concentration at the other size bins can serve as an alternative way to replace negative numbers in the size distribution raw dataset thanks to its high accuracy and reliability (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.97–1). This negative-number filling approach can maintain a symmetric distribution of errors and complement the existing ill-posed built-in algorithm in particle sizer instruments.</p>
url	https://amt.copernicus.org/articles/14/5535/2021/amt-14-5535-2021.pdf
work_keys_str_mv	AT plfung dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks AT plfung dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks AT mazaidan dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks AT mazaidan dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks AT mazaidan dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks AT osurakhi dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks AT starkoma dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks AT tpetaja dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks AT tpetaja dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks AT thussein dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks AT thussein dataimputationininsitumeasuredparticlesizedistributionsbymeansofneuralnetworks
_version_	1721208405716107264
spelling	doaj-286050e6405341048cd49d7ec0eda2482021-08-13T12:15:16ZengCopernicus PublicationsAtmospheric Measurement Techniques1867-13811867-85482021-08-01145535555410.5194/amt-14-5535-2021Data imputation in in situ-measured particle size distributions by means of neural networksP. L. Fung0P. L. Fung1M. A. Zaidan2M. A. Zaidan3M. A. Zaidan4O. Surakhi5S. Tarkoma6T. Petäjä7T. Petäjä8T. Hussein9T. Hussein10Institute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandHelsinki Institute of Sustainability Science, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandInstitute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandHelsinki Institute of Sustainability Science, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandJoint International Research Laboratory of Atmospheric and Earth System Sciences, School of Atmospheric Sciences, Nanjing University, Nanjing 210023, ChinaDepartment of Computer Science, The University of Jordan, Amman 11942, JordanDepartment of Computer Science, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandInstitute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandJoint International Research Laboratory of Atmospheric and Earth System Sciences, School of Atmospheric Sciences, Nanjing University, Nanjing 210023, ChinaInstitute for Atmospheric and Earth System Research/Physics, Faculty of Science, University of Helsinki, 00140 Helsinki, FinlandDepartment of Physics, The University of Jordan, Amman 11942, Jordan<p><span id="page5536"/>In air quality research, often only size-integrated particle mass concentrations as indicators of aerosol particles are considered. However, the mass concentrations do not provide sufficient information to convey the full story of fractionated size distribution, in which the particles of different diameters (<span class="inline-formula"><i>D</i><sub>p</sub></span>) are able to deposit differently on respiratory system and cause various harm. Aerosol size distribution measurements rely on a variety of techniques to classify the aerosol size and measure the size distribution. From the raw data the ambient size distribution is determined utilising a suite of inversion algorithms. However, the inversion problem is quite often ill-posed and challenging to solve. Due to the instrumental insufficiency and inversion limitations, imputation methods for fractionated particle size distribution are of great significance to fill the missing gaps or negative values. The study at hand involves a merged particle size distribution, from a scanning mobility particle sizer (NanoSMPS) and an optical particle sizer (OPS) covering the aerosol size distributions from 0.01 to 0.42 <span class="inline-formula">µm</span> (electrical mobility equivalent size) and 0.3 to 10 <span class="inline-formula">µm</span> (optical equivalent size) and meteorological parameters collected at an urban background region in Amman, Jordan, in the period of 1 August 2016–31 July 2017. We develop and evaluate feed-forward neural network (FFNN) approaches to estimate number concentrations at particular size bin with (1) meteorological parameters, (2) number concentration at other size bins and (3) both of the above as input variables. Two layers with 10–15 neurons are found to be the optimal option. Worse performance is observed at the lower edge (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M4" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">0.01</mn><mo><</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo><</mo><mn mathvariant="normal">0.02</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="83pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="0a19d858f772fceabb7358981286c2af"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00001.svg" width="83pt" height="14pt" src="amt-14-5535-2021-ie00001.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>), the mid-range region (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M6" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">0.15</mn><mo><</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo><</mo><mn mathvariant="normal">0.5</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="77pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="4c842197885cdc4dc6daf89731acd19e"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00002.svg" width="77pt" height="14pt" src="amt-14-5535-2021-ie00002.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>) and the upper edge (<span class="inline-formula"><math xmlns="http://www.w3.org/1998/Math/MathML" id="M8" display="inline" overflow="scroll" dspmath="mathml"><mrow><mn mathvariant="normal">6</mn><mo><</mo><msub><mi>D</mi><mi mathvariant="normal">p</mi></msub><mo><</mo><mn mathvariant="normal">10</mn></mrow></math><span><svg:svg xmlns:svg="http://www.w3.org/2000/svg" width="58pt" height="14pt" class="svg-formula" dspmath="mathimg" md5hash="a7eb375b62532835fdc23b1be88eb5ad"><svg:image xmlns:xlink="http://www.w3.org/1999/xlink" xlink:href="amt-14-5535-2021-ie00003.svg" width="58pt" height="14pt" src="amt-14-5535-2021-ie00003.png"/></svg:svg></span></span> <span class="inline-formula">µm</span>). For the edges at both ends, the number of neighbouring size bins is limited, and the detection efficiency by the corresponding instruments is lower compared to the other size bins. A distinct performance drop over the overlapping mid-range region is due to the deficiency of a merging algorithm. Another plausible reason for the poorer performance for finer particles is that they are more effectively removed from the atmosphere compared to the coarser particles so that the relationships between the input variables and the small particles are more dynamic. An observable overestimation is also found in the early morning for ultrafine particles followed by a distinct underestimation before midday. In the winter, due to a possible sensor drift and interference artefacts, the estimation performance is not as good as the other seasons. The FFNN approach by meteorological parameters using 5 min data (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.22–0.58) shows poorer results than data with longer time resolution (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.66–0.77). The FFNN approach using the number concentration at the other size bins can serve as an alternative way to replace negative numbers in the size distribution raw dataset thanks to its high accuracy and reliability (<span class="inline-formula"><i>R</i><sup>2</sup>=</span> 0.97–1). This negative-number filling approach can maintain a symmetric distribution of errors and complement the existing ill-posed built-in algorithm in particle sizer instruments.</p>https://amt.copernicus.org/articles/14/5535/2021/amt-14-5535-2021.pdf

Data imputation in in situ-measured particle size distributions by means of neural networks

Similar Items