Chinese Personal Name Disambiguation Based on Clustering

Personal name disambiguation is a significant issue in natural language processing, which is the basis for many tasks in automatic information processing. This research explores the Chinese personal name disambiguation based on clustering technique. Preprocessing is applied to transform raw corpus i...

Full description

Bibliographic Details
Main Authors:	Chao Fan, Yu Li
Format:	Article
Language:	English
Published:	Hindawi-Wiley 2021-01-01
Series:	Wireless Communications and Mobile Computing
Online Access:	http://dx.doi.org/10.1155/2021/3790176

id	doaj-33148f8a18a94602bd5dd177ee4d07d7
record_format	Article
spelling	doaj-33148f8a18a94602bd5dd177ee4d07d72021-05-24T00:15:01ZengHindawi-WileyWireless Communications and Mobile Computing1530-86772021-01-01202110.1155/2021/3790176Chinese Personal Name Disambiguation Based on ClusteringChao Fan0Yu Li1The School of Artificial Intelligence and Computer ScienceThe School of Artificial Intelligence and Computer SciencePersonal name disambiguation is a significant issue in natural language processing, which is the basis for many tasks in automatic information processing. This research explores the Chinese personal name disambiguation based on clustering technique. Preprocessing is applied to transform raw corpus into standardized format at the beginning. And then, Chinese word segmentation, part-of-speech tagging, and named entity recognition are accomplished by lexical analysis. Furthermore, we make an effort to extract features that can better disambiguate Chinese personal names. Some rules for identifying target personal names are created to improve the experimental effect. Additionally, many calculation methods of feature weights are implemented such as bool weight, absolute frequency weight, tf-idf weight, and entropy weight. As for clustering algorithm, an agglomerative hierarchical clustering is selected by comparison with other clustering methods. Finally, a labeling approach is employed to bring forward feature words that can represent each cluster. The experiment achieves a good result for five groups of Chinese personal names.http://dx.doi.org/10.1155/2021/3790176
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Chao Fan Yu Li
spellingShingle	Chao Fan Yu Li Chinese Personal Name Disambiguation Based on Clustering Wireless Communications and Mobile Computing
author_facet	Chao Fan Yu Li
author_sort	Chao Fan
title	Chinese Personal Name Disambiguation Based on Clustering
title_short	Chinese Personal Name Disambiguation Based on Clustering
title_full	Chinese Personal Name Disambiguation Based on Clustering
title_fullStr	Chinese Personal Name Disambiguation Based on Clustering
title_full_unstemmed	Chinese Personal Name Disambiguation Based on Clustering
title_sort	chinese personal name disambiguation based on clustering
publisher	Hindawi-Wiley
series	Wireless Communications and Mobile Computing
issn	1530-8677
publishDate	2021-01-01
description	Personal name disambiguation is a significant issue in natural language processing, which is the basis for many tasks in automatic information processing. This research explores the Chinese personal name disambiguation based on clustering technique. Preprocessing is applied to transform raw corpus into standardized format at the beginning. And then, Chinese word segmentation, part-of-speech tagging, and named entity recognition are accomplished by lexical analysis. Furthermore, we make an effort to extract features that can better disambiguate Chinese personal names. Some rules for identifying target personal names are created to improve the experimental effect. Additionally, many calculation methods of feature weights are implemented such as bool weight, absolute frequency weight, tf-idf weight, and entropy weight. As for clustering algorithm, an agglomerative hierarchical clustering is selected by comparison with other clustering methods. Finally, a labeling approach is employed to bring forward feature words that can represent each cluster. The experiment achieves a good result for five groups of Chinese personal names.
url	http://dx.doi.org/10.1155/2021/3790176
work_keys_str_mv	AT chaofan chinesepersonalnamedisambiguationbasedonclustering AT yuli chinesepersonalnamedisambiguationbasedonclustering
_version_	1721429195293196288

Chinese Personal Name Disambiguation Based on Clustering

Similar Items