Open Domain Chinese Triples Hierarchical Extraction Method

Open domain relation prediction is an important task in triples extraction. When faced with the task of constructing large-scale knowledge graph systems, with the exception of structured data, it is necessary to automatically extract triples from a large amount of unstructured text to expand entitie...

Full description

Bibliographic Details
Main Authors: Chunhui He, Zhen Tan, Haoran Wang, Chong Zhang, Yanli Hu, Bin Ge
Format: Article
Language:English
Published: MDPI AG 2020-07-01
Series:Applied Sciences
Subjects:
Online Access:https://www.mdpi.com/2076-3417/10/14/4819
id doaj-c28affcf5e5947979e56dd122df2ceac
record_format Article
spelling doaj-c28affcf5e5947979e56dd122df2ceac2020-11-25T02:17:10ZengMDPI AGApplied Sciences2076-34172020-07-01104819481910.3390/app10144819Open Domain Chinese Triples Hierarchical Extraction MethodChunhui He0Zhen Tan1Haoran Wang2Chong Zhang3Yanli Hu4Bin Ge5Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, ChinaScience and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, ChinaScience and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, ChinaScience and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, ChinaScience and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, ChinaScience and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, ChinaOpen domain relation prediction is an important task in triples extraction. When faced with the task of constructing large-scale knowledge graph systems, with the exception of structured data, it is necessary to automatically extract triples from a large amount of unstructured text to expand entities and relations. Although a large number of English open relation prediction methods have achieved good performance, the high-performance system for open domain Chinese triples extraction remains undeveloped due to the lack of large-scale Chinese annotation corpora and the difficulty of Chinese language processing. In this paper, we propose an integrated open domain Chinese triples hierarchical extraction method (CTHE) to solve this problem, considering the advantages of Bi-LSTM-CRF and Att-Bi-GRU models based on the pre-trained BERT encoding model. This method can recognize the named entities from Chinese sentences to establish entity pairs, and implement hierarchical extraction of specific and open relations based on the user-defined schema library and attention mechanism. The experimental results demonstrate the effectiveness of this method, which achieved stable performance on the test dataset, and better precision and F1-score in comparison with state-of-the-art Chinese open domain triples extraction methods. Furthermore, a large-scale annotated dataset for a Chinese named entity recognition (NER) task is established, which provides support for research on Chinese NER tasks.https://www.mdpi.com/2076-3417/10/14/4819named entity recognitionopen relation predictioninformation extractionCTHE
collection DOAJ
language English
format Article
sources DOAJ
author Chunhui He
Zhen Tan
Haoran Wang
Chong Zhang
Yanli Hu
Bin Ge
spellingShingle Chunhui He
Zhen Tan
Haoran Wang
Chong Zhang
Yanli Hu
Bin Ge
Open Domain Chinese Triples Hierarchical Extraction Method
Applied Sciences
named entity recognition
open relation prediction
information extraction
CTHE
author_facet Chunhui He
Zhen Tan
Haoran Wang
Chong Zhang
Yanli Hu
Bin Ge
author_sort Chunhui He
title Open Domain Chinese Triples Hierarchical Extraction Method
title_short Open Domain Chinese Triples Hierarchical Extraction Method
title_full Open Domain Chinese Triples Hierarchical Extraction Method
title_fullStr Open Domain Chinese Triples Hierarchical Extraction Method
title_full_unstemmed Open Domain Chinese Triples Hierarchical Extraction Method
title_sort open domain chinese triples hierarchical extraction method
publisher MDPI AG
series Applied Sciences
issn 2076-3417
publishDate 2020-07-01
description Open domain relation prediction is an important task in triples extraction. When faced with the task of constructing large-scale knowledge graph systems, with the exception of structured data, it is necessary to automatically extract triples from a large amount of unstructured text to expand entities and relations. Although a large number of English open relation prediction methods have achieved good performance, the high-performance system for open domain Chinese triples extraction remains undeveloped due to the lack of large-scale Chinese annotation corpora and the difficulty of Chinese language processing. In this paper, we propose an integrated open domain Chinese triples hierarchical extraction method (CTHE) to solve this problem, considering the advantages of Bi-LSTM-CRF and Att-Bi-GRU models based on the pre-trained BERT encoding model. This method can recognize the named entities from Chinese sentences to establish entity pairs, and implement hierarchical extraction of specific and open relations based on the user-defined schema library and attention mechanism. The experimental results demonstrate the effectiveness of this method, which achieved stable performance on the test dataset, and better precision and F1-score in comparison with state-of-the-art Chinese open domain triples extraction methods. Furthermore, a large-scale annotated dataset for a Chinese named entity recognition (NER) task is established, which provides support for research on Chinese NER tasks.
topic named entity recognition
open relation prediction
information extraction
CTHE
url https://www.mdpi.com/2076-3417/10/14/4819
work_keys_str_mv AT chunhuihe opendomainchinesetripleshierarchicalextractionmethod
AT zhentan opendomainchinesetripleshierarchicalextractionmethod
AT haoranwang opendomainchinesetripleshierarchicalextractionmethod
AT chongzhang opendomainchinesetripleshierarchicalextractionmethod
AT yanlihu opendomainchinesetripleshierarchicalextractionmethod
AT binge opendomainchinesetripleshierarchicalextractionmethod
_version_ 1724887761288167424