Learning Subword Embedding to Improve Uyghur Named-Entity Recognition
Uyghur is a morphologically rich and typical agglutinating language, and morphological segmentation affects the performance of Uyghur named-entity recognition (NER). Common Uyghur NER systems use the word sequence as input and rely heavily on feature engineering. However, semantic information cannot...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2019-04-01
|
Series: | Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2078-2489/10/4/139 |
id |
doaj-0974f45d5f8943c0bfc4e673bae290f2 |
---|---|
record_format |
Article |
spelling |
doaj-0974f45d5f8943c0bfc4e673bae290f22020-11-25T02:18:27ZengMDPI AGInformation2078-24892019-04-0110413910.3390/info10040139info10040139Learning Subword Embedding to Improve Uyghur Named-Entity RecognitionAlimu Saimaiti0Lulu Wang1Tuergen Yibulayin2College of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaCollege of Information Science and Engineering, Xinjiang University, Urumqi 830046, ChinaUyghur is a morphologically rich and typical agglutinating language, and morphological segmentation affects the performance of Uyghur named-entity recognition (NER). Common Uyghur NER systems use the word sequence as input and rely heavily on feature engineering. However, semantic information cannot be fully learned and will easily suffer from data sparsity arising from morphological processes when only the word sequence is considered. To solve this problem, we provide a neural network architecture employing subword embedding with character embedding based on a bidirectional long short-term memory network with a conditional random field layer. Our experiments show that subword embedding can effectively enhance the performance of the Uyghur NER, and the proposed method outperforms the model-based word sequence method.https://www.mdpi.com/2078-2489/10/4/139subword embeddingUyghurnamed-entity recognitionmorphological processingword sequencenatural language processingdeep learningword-based neural model |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Alimu Saimaiti Lulu Wang Tuergen Yibulayin |
spellingShingle |
Alimu Saimaiti Lulu Wang Tuergen Yibulayin Learning Subword Embedding to Improve Uyghur Named-Entity Recognition Information subword embedding Uyghur named-entity recognition morphological processing word sequence natural language processing deep learning word-based neural model |
author_facet |
Alimu Saimaiti Lulu Wang Tuergen Yibulayin |
author_sort |
Alimu Saimaiti |
title |
Learning Subword Embedding to Improve Uyghur Named-Entity Recognition |
title_short |
Learning Subword Embedding to Improve Uyghur Named-Entity Recognition |
title_full |
Learning Subword Embedding to Improve Uyghur Named-Entity Recognition |
title_fullStr |
Learning Subword Embedding to Improve Uyghur Named-Entity Recognition |
title_full_unstemmed |
Learning Subword Embedding to Improve Uyghur Named-Entity Recognition |
title_sort |
learning subword embedding to improve uyghur named-entity recognition |
publisher |
MDPI AG |
series |
Information |
issn |
2078-2489 |
publishDate |
2019-04-01 |
description |
Uyghur is a morphologically rich and typical agglutinating language, and morphological segmentation affects the performance of Uyghur named-entity recognition (NER). Common Uyghur NER systems use the word sequence as input and rely heavily on feature engineering. However, semantic information cannot be fully learned and will easily suffer from data sparsity arising from morphological processes when only the word sequence is considered. To solve this problem, we provide a neural network architecture employing subword embedding with character embedding based on a bidirectional long short-term memory network with a conditional random field layer. Our experiments show that subword embedding can effectively enhance the performance of the Uyghur NER, and the proposed method outperforms the model-based word sequence method. |
topic |
subword embedding Uyghur named-entity recognition morphological processing word sequence natural language processing deep learning word-based neural model |
url |
https://www.mdpi.com/2078-2489/10/4/139 |
work_keys_str_mv |
AT alimusaimaiti learningsubwordembeddingtoimproveuyghurnamedentityrecognition AT luluwang learningsubwordembeddingtoimproveuyghurnamedentityrecognition AT tuergenyibulayin learningsubwordembeddingtoimproveuyghurnamedentityrecognition |
_version_ |
1724882043237564416 |