Identification of new particle formation events with deep learning

<p>New particle formation (NPF) in the atmosphere is globally an important source of climate relevant aerosol particles. Occurrence of NPF events is typically analyzed by researchers manually from particle size distribution data day by day, which is time consuming and the classification of...

Full description

Bibliographic Details
Main Authors: J. Joutsensaari, M. Ozon, T. Nieminen, S. Mikkonen, T. Lähivaara, S. Decesari, M. C. Facchini, A. Laaksonen, K. E. J. Lehtinen
Format: Article
Language:English
Published: Copernicus Publications 2018-07-01
Series:Atmospheric Chemistry and Physics
Online Access:https://www.atmos-chem-phys.net/18/9597/2018/acp-18-9597-2018.pdf
id doaj-5ad20a0b2a0d46ddb7895dd310eab6dc
record_format Article
collection DOAJ
language English
format Article
sources DOAJ
author J. Joutsensaari
M. Ozon
T. Nieminen
S. Mikkonen
T. Lähivaara
S. Decesari
M. C. Facchini
A. Laaksonen
A. Laaksonen
K. E. J. Lehtinen
K. E. J. Lehtinen
spellingShingle J. Joutsensaari
M. Ozon
T. Nieminen
S. Mikkonen
T. Lähivaara
S. Decesari
M. C. Facchini
A. Laaksonen
A. Laaksonen
K. E. J. Lehtinen
K. E. J. Lehtinen
Identification of new particle formation events with deep learning
Atmospheric Chemistry and Physics
author_facet J. Joutsensaari
M. Ozon
T. Nieminen
S. Mikkonen
T. Lähivaara
S. Decesari
M. C. Facchini
A. Laaksonen
A. Laaksonen
K. E. J. Lehtinen
K. E. J. Lehtinen
author_sort J. Joutsensaari
title Identification of new particle formation events with deep learning
title_short Identification of new particle formation events with deep learning
title_full Identification of new particle formation events with deep learning
title_fullStr Identification of new particle formation events with deep learning
title_full_unstemmed Identification of new particle formation events with deep learning
title_sort identification of new particle formation events with deep learning
publisher Copernicus Publications
series Atmospheric Chemistry and Physics
issn 1680-7316
1680-7324
publishDate 2018-07-01
description <p>New particle formation (NPF) in the atmosphere is globally an important source of climate relevant aerosol particles. Occurrence of NPF events is typically analyzed by researchers manually from particle size distribution data day by day, which is time consuming and the classification of event types may be inconsistent. To get more reliable and consistent results, the NPF event analysis should be automatized. We have developed an automatic analysis method based on deep learning, a subarea of machine learning, for NPF event identification. To our knowledge, this is the first time that a deep learning method, i.e., transfer learning of a convolutional neural network (CNN), has successfully been used to automatically classify NPF events into different classes directly from particle size distribution images, similarly to how the researchers carry out the manual classification. The developed method is based on image analysis of particle size distributions using a pretrained deep CNN, named AlexNet, which was transfer learned to recognize NPF event classes (six different types). In transfer learning, a partial set of particle size distribution images was used in the training stage of the CNN and the rest of the images for testing the success of the training. The method was utilized for a 15-year-long dataset measured at San Pietro Capofiume (SPC) in Italy. We studied the performance of the training with different training and testing of image number ratios as well as with different regions of interest in the images. The results show that clear event (i.e., classes 1 and 2) and nonevent days can be identified with an accuracy of ca. 80 %, when the CNN classification is compared with that of an expert, which is a good first result for automatic NPF event analysis. In the event classification, the choice between different event classes is not an easy task even for trained researchers, and thus overlapping or confusion between different classes occurs. Hence, we cross-validated the learning results of CNN with the expert-made classification. The results show that the overlapping occurs, typically between the adjacent or similar type of classes, e.g., a manually classified Class 1 is categorized mainly into classes 1 and 2 by CNN, indicating that the manual and CNN classifications are very consistent for most of the days. The classification would be more consistent, by both human and CNN, if only two different classes are used for event days instead of three classes. Thus, we recommend that in the future analysis, event days should be categorized into classes of <q>quantifiable</q> (i.e., clear events, classes 1 and 2) and <q>nonquantifiable</q> (i.e., weak events, Class  3). This would better describe the difference of those classes: both formation and growth rates can be determined for quantifiable days but not both for nonquantifiable days. Furthermore, we investigated more deeply the days that are classified as clear events by experts and recognized as nonevents by the CNN and vice versa. Clear misclassifications seem to occur more commonly in manual analysis than in the CNN categorization, which is mostly due to the inconsistency in the human-made classification or errors in the booking of the event class. In general, the automatic CNN classifier has a better reliability and repeatability in NPF event classification than human-made classification and, thus, the transfer-learned pretrained CNNs are powerful tools to analyze long-term datasets. The developed NPF event classifier can be easily utilized to analyze any long-term datasets more accurately and consistently, which helps us to understand in detail aerosol–climate interactions and the long-term effects of climate change on NPF in the atmosphere. We encourage researchers to use the model in other sites. However, we suggest that the CNN should be transfer learned again for new site data with a minimum of ca. 150 figures per class to obtain good enough classification results, especially if the size distribution evolution differs from training data. In the future, we will utilize the method for data from other sites, develop it to analyze more parameters and evaluate how successfully CNN could be trained with synthetic NPF event data.</p>
url https://www.atmos-chem-phys.net/18/9597/2018/acp-18-9597-2018.pdf
work_keys_str_mv AT jjoutsensaari identificationofnewparticleformationeventswithdeeplearning
AT mozon identificationofnewparticleformationeventswithdeeplearning
AT tnieminen identificationofnewparticleformationeventswithdeeplearning
AT smikkonen identificationofnewparticleformationeventswithdeeplearning
AT tlahivaara identificationofnewparticleformationeventswithdeeplearning
AT sdecesari identificationofnewparticleformationeventswithdeeplearning
AT mcfacchini identificationofnewparticleformationeventswithdeeplearning
AT alaaksonen identificationofnewparticleformationeventswithdeeplearning
AT alaaksonen identificationofnewparticleformationeventswithdeeplearning
AT kejlehtinen identificationofnewparticleformationeventswithdeeplearning
AT kejlehtinen identificationofnewparticleformationeventswithdeeplearning
_version_ 1726016727152590848
spelling doaj-5ad20a0b2a0d46ddb7895dd310eab6dc2020-11-24T21:16:10ZengCopernicus PublicationsAtmospheric Chemistry and Physics1680-73161680-73242018-07-01189597961510.5194/acp-18-9597-2018Identification of new particle formation events with deep learningJ. Joutsensaari0M. Ozon1T. Nieminen2S. Mikkonen3T. Lähivaara4S. Decesari5M. C. Facchini6A. Laaksonen7A. Laaksonen8K. E. J. Lehtinen9K. E. J. Lehtinen10Department of Applied Physics, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, FinlandDepartment of Applied Physics, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, FinlandDepartment of Applied Physics, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, FinlandDepartment of Applied Physics, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, FinlandDepartment of Applied Physics, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, FinlandInstitute of Atmospheric Sciences and Climate of the Italian National Research Council, Bologna, ItalyInstitute of Atmospheric Sciences and Climate of the Italian National Research Council, Bologna, ItalyDepartment of Applied Physics, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, FinlandClimate research Unit, Finnish Meteorological Institute, Helsinki, FinlandDepartment of Applied Physics, University of Eastern Finland, P.O. Box 1627, 70211 Kuopio, FinlandAtmospheric Research Centre of Eastern Finland, Finnish Meteorological Institute, Kuopio, Finland<p>New particle formation (NPF) in the atmosphere is globally an important source of climate relevant aerosol particles. Occurrence of NPF events is typically analyzed by researchers manually from particle size distribution data day by day, which is time consuming and the classification of event types may be inconsistent. To get more reliable and consistent results, the NPF event analysis should be automatized. We have developed an automatic analysis method based on deep learning, a subarea of machine learning, for NPF event identification. To our knowledge, this is the first time that a deep learning method, i.e., transfer learning of a convolutional neural network (CNN), has successfully been used to automatically classify NPF events into different classes directly from particle size distribution images, similarly to how the researchers carry out the manual classification. The developed method is based on image analysis of particle size distributions using a pretrained deep CNN, named AlexNet, which was transfer learned to recognize NPF event classes (six different types). In transfer learning, a partial set of particle size distribution images was used in the training stage of the CNN and the rest of the images for testing the success of the training. The method was utilized for a 15-year-long dataset measured at San Pietro Capofiume (SPC) in Italy. We studied the performance of the training with different training and testing of image number ratios as well as with different regions of interest in the images. The results show that clear event (i.e., classes 1 and 2) and nonevent days can be identified with an accuracy of ca. 80 %, when the CNN classification is compared with that of an expert, which is a good first result for automatic NPF event analysis. In the event classification, the choice between different event classes is not an easy task even for trained researchers, and thus overlapping or confusion between different classes occurs. Hence, we cross-validated the learning results of CNN with the expert-made classification. The results show that the overlapping occurs, typically between the adjacent or similar type of classes, e.g., a manually classified Class 1 is categorized mainly into classes 1 and 2 by CNN, indicating that the manual and CNN classifications are very consistent for most of the days. The classification would be more consistent, by both human and CNN, if only two different classes are used for event days instead of three classes. Thus, we recommend that in the future analysis, event days should be categorized into classes of <q>quantifiable</q> (i.e., clear events, classes 1 and 2) and <q>nonquantifiable</q> (i.e., weak events, Class  3). This would better describe the difference of those classes: both formation and growth rates can be determined for quantifiable days but not both for nonquantifiable days. Furthermore, we investigated more deeply the days that are classified as clear events by experts and recognized as nonevents by the CNN and vice versa. Clear misclassifications seem to occur more commonly in manual analysis than in the CNN categorization, which is mostly due to the inconsistency in the human-made classification or errors in the booking of the event class. In general, the automatic CNN classifier has a better reliability and repeatability in NPF event classification than human-made classification and, thus, the transfer-learned pretrained CNNs are powerful tools to analyze long-term datasets. The developed NPF event classifier can be easily utilized to analyze any long-term datasets more accurately and consistently, which helps us to understand in detail aerosol–climate interactions and the long-term effects of climate change on NPF in the atmosphere. We encourage researchers to use the model in other sites. However, we suggest that the CNN should be transfer learned again for new site data with a minimum of ca. 150 figures per class to obtain good enough classification results, especially if the size distribution evolution differs from training data. In the future, we will utilize the method for data from other sites, develop it to analyze more parameters and evaluate how successfully CNN could be trained with synthetic NPF event data.</p>https://www.atmos-chem-phys.net/18/9597/2018/acp-18-9597-2018.pdf