Text Mining of Hazard and Operability Analysis Reports Based on Active Learning
In the field of chemical safety, a named entity recognition (NER) model based on deep learning can mine valuable information from hazard and operability analysis (HAZOP) text, which can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the sy...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-07-01
|
Series: | Processes |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-9717/9/7/1178 |
id |
doaj-a0880bf5cd9843d1882e99238feb35c5 |
---|---|
record_format |
Article |
spelling |
doaj-a0880bf5cd9843d1882e99238feb35c52021-07-23T14:03:14ZengMDPI AGProcesses2227-97172021-07-0191178117810.3390/pr9071178Text Mining of Hazard and Operability Analysis Reports Based on Active LearningZhenhua Wang0Beike Zhang1Dong Gao2College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, ChinaCollege of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, ChinaCollege of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, ChinaIn the field of chemical safety, a named entity recognition (NER) model based on deep learning can mine valuable information from hazard and operability analysis (HAZOP) text, which can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the system, and be of great significance to improve the safety of the whole chemical system. However, due to the standardization and professionalism of chemical safety analysis text, it is difficult to improve the performance of traditional models. To solve this problem, in this study, an improved method based on active learning is proposed, and three novel sampling algorithms are designed, Variation of Token Entropy (VTE), HAZOP Confusion Entropy (HCE) and Amplification of Least Confidence (ALC), which improve the ability of the model to understand HAZOP text. In this method, a part of data is used to establish the initial model. The sampling algorithm is then used to select high-quality samples from the data set. Finally, these high-quality samples are used to retrain the whole model to obtain the final model. The experimental results show that the performance of the VTE, HCE, and ALC algorithms are better than that of random sampling algorithms. In addition, compared with other methods, the performance of the traditional model is improved effectively by the method proposed in this paper, which proves that the method is reliable and advanced.https://www.mdpi.com/2227-9717/9/7/1178active learningsampling algorithmhazard and operability analysisdeep learningnamed entity recognition |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhenhua Wang Beike Zhang Dong Gao |
spellingShingle |
Zhenhua Wang Beike Zhang Dong Gao Text Mining of Hazard and Operability Analysis Reports Based on Active Learning Processes active learning sampling algorithm hazard and operability analysis deep learning named entity recognition |
author_facet |
Zhenhua Wang Beike Zhang Dong Gao |
author_sort |
Zhenhua Wang |
title |
Text Mining of Hazard and Operability Analysis Reports Based on Active Learning |
title_short |
Text Mining of Hazard and Operability Analysis Reports Based on Active Learning |
title_full |
Text Mining of Hazard and Operability Analysis Reports Based on Active Learning |
title_fullStr |
Text Mining of Hazard and Operability Analysis Reports Based on Active Learning |
title_full_unstemmed |
Text Mining of Hazard and Operability Analysis Reports Based on Active Learning |
title_sort |
text mining of hazard and operability analysis reports based on active learning |
publisher |
MDPI AG |
series |
Processes |
issn |
2227-9717 |
publishDate |
2021-07-01 |
description |
In the field of chemical safety, a named entity recognition (NER) model based on deep learning can mine valuable information from hazard and operability analysis (HAZOP) text, which can guide experts to carry out a new round of HAZOP analysis, help practitioners optimize the hidden dangers in the system, and be of great significance to improve the safety of the whole chemical system. However, due to the standardization and professionalism of chemical safety analysis text, it is difficult to improve the performance of traditional models. To solve this problem, in this study, an improved method based on active learning is proposed, and three novel sampling algorithms are designed, Variation of Token Entropy (VTE), HAZOP Confusion Entropy (HCE) and Amplification of Least Confidence (ALC), which improve the ability of the model to understand HAZOP text. In this method, a part of data is used to establish the initial model. The sampling algorithm is then used to select high-quality samples from the data set. Finally, these high-quality samples are used to retrain the whole model to obtain the final model. The experimental results show that the performance of the VTE, HCE, and ALC algorithms are better than that of random sampling algorithms. In addition, compared with other methods, the performance of the traditional model is improved effectively by the method proposed in this paper, which proves that the method is reliable and advanced. |
topic |
active learning sampling algorithm hazard and operability analysis deep learning named entity recognition |
url |
https://www.mdpi.com/2227-9717/9/7/1178 |
work_keys_str_mv |
AT zhenhuawang textminingofhazardandoperabilityanalysisreportsbasedonactivelearning AT beikezhang textminingofhazardandoperabilityanalysisreportsbasedonactivelearning AT donggao textminingofhazardandoperabilityanalysisreportsbasedonactivelearning |
_version_ |
1721286230393487360 |