Learning Subword Embedding to Improve Uyghur Named-Entity Recognition

Uyghur is a morphologically rich and typical agglutinating language, and morphological segmentation affects the performance of Uyghur named-entity recognition (NER). Common Uyghur NER systems use the word sequence as input and rely heavily on feature engineering. However, semantic information cannot...

Full description

Bibliographic Details
Main Authors: Alimu Saimaiti, Lulu Wang, Tuergen Yibulayin
Format: Article
Language:English
Published: MDPI AG 2019-04-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/10/4/139
id doaj-0974f45d5f8943c0bfc4e673bae290f2
record_format Article
spelling doaj-0974f45d5f8943c0bfc4e673bae290f22020-11-25T02:18:27ZengMDPI AGInformation2078-24892019-04-0110413910.3390/info10040139info10040139Learning Subword Embedding to Improve Uyghur Named-Entity RecognitionAlimu Saimaiti0Lulu Wang1Tuergen Yibulayin2College of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaUyghur is a morphologically rich and typical agglutinating language, and morphological segmentation affects the performance of Uyghur named-entity recognition (NER). Common Uyghur NER systems use the word sequence as input and rely heavily on feature engineering. However, semantic information cannot be fully learned and will easily suffer from data sparsity arising from morphological processes when only the word sequence is considered. To solve this problem, we provide a neural network architecture employing subword embedding with character embedding based on a bidirectional long short-term memory network with a conditional random field layer. Our experiments show that subword embedding can effectively enhance the performance of the Uyghur NER, and the proposed method outperforms the model-based word sequence method.https://www.mdpi.com/2078-2489/10/4/139subword embeddingUyghurnamed-entity recognitionmorphological processingword sequencenatural language processingdeep learningword-based neural model
collection DOAJ
language English
format Article
sources DOAJ
author Alimu Saimaiti
Lulu Wang
Tuergen Yibulayin
spellingShingle Alimu Saimaiti
Lulu Wang
Tuergen Yibulayin
Learning Subword Embedding to Improve Uyghur Named-Entity Recognition
Information
subword embedding
Uyghur
named-entity recognition
morphological processing
word sequence
natural language processing
deep learning
word-based neural model
author_facet Alimu Saimaiti
Lulu Wang
Tuergen Yibulayin
author_sort Alimu Saimaiti
title Learning Subword Embedding to Improve Uyghur Named-Entity Recognition
title_short Learning Subword Embedding to Improve Uyghur Named-Entity Recognition
title_full Learning Subword Embedding to Improve Uyghur Named-Entity Recognition
title_fullStr Learning Subword Embedding to Improve Uyghur Named-Entity Recognition
title_full_unstemmed Learning Subword Embedding to Improve Uyghur Named-Entity Recognition
title_sort learning subword embedding to improve uyghur named-entity recognition
publisher MDPI AG
series Information
issn 2078-2489
publishDate 2019-04-01
description Uyghur is a morphologically rich and typical agglutinating language, and morphological segmentation affects the performance of Uyghur named-entity recognition (NER). Common Uyghur NER systems use the word sequence as input and rely heavily on feature engineering. However, semantic information cannot be fully learned and will easily suffer from data sparsity arising from morphological processes when only the word sequence is considered. To solve this problem, we provide a neural network architecture employing subword embedding with character embedding based on a bidirectional long short-term memory network with a conditional random field layer. Our experiments show that subword embedding can effectively enhance the performance of the Uyghur NER, and the proposed method outperforms the model-based word sequence method.
topic subword embedding
Uyghur
named-entity recognition
morphological processing
word sequence
natural language processing
deep learning
word-based neural model
url https://www.mdpi.com/2078-2489/10/4/139
work_keys_str_mv AT alimusaimaiti learningsubwordembeddingtoimproveuyghurnamedentityrecognition
AT luluwang learningsubwordembeddingtoimproveuyghurnamedentityrecognition
AT tuergenyibulayin learningsubwordembeddingtoimproveuyghurnamedentityrecognition
_version_ 1724882043237564416