COST-SENSITIVE STRUCTURED PERCEPTRON INCORPORATING CATEGORY HIERARCHY FOR NAMED ENTITY RECOGNITION
Named Entity Recognition (NER) is a fundamental natural language processing task for the identifi cation and classifi cation of expressions into predefi ned categories, such as person and organization. Existing NER systems usually target about 10 categories and do not incorporate analysis of categor...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
UUM Press
2015-03-01
|
Series: | Journal of ICT |
Online Access: | https://www.scienceopen.com/document?vid=ce201f64-ec80-452a-a18d-5eb8bd38b978 |
id |
doaj-4a4e26f8819848a29ba5322fa3a489dc |
---|---|
record_format |
Article |
spelling |
doaj-4a4e26f8819848a29ba5322fa3a489dc2021-08-03T00:25:12ZengUUM PressJournal of ICT1675-414X2015-03-0110.32890/jict.14.2015.8153COST-SENSITIVE STRUCTURED PERCEPTRON INCORPORATING CATEGORY HIERARCHY FOR NAMED ENTITY RECOGNITIONShohei HigashiyamaBlondel MathieuKazuhiro SekiKuniaki UeharaNamed Entity Recognition (NER) is a fundamental natural language processing task for the identifi cation and classifi cation of expressions into predefi ned categories, such as person and organization. Existing NER systems usually target about 10 categories and do not incorporate analysis of category relations. However, categories often belong naturally to some predefi ned hierarchy. In such cases, the distance between categories in the hierarchy becomes a rich source of information that can be exploited. This is intuitively useful particularly when the categories are numerous. On that account, this paper proposes an NER approach that can leverage category hierarchy information by introducing, in the structured perceptron framework, a cost function more strongly penalizing category predictions that are more distant from the correct category in the hierarchy. Experimental results on the GENIA biomedical text corpus indicate the effectiveness of the proposed approach as compared with the case where no cost function is utilized. In addition, the proposed approach demonstrates the superior performance over a representative work using multi-class support vector machines on the same corpus. A possible direction to further improve the proposed approach is to investigate more elaborate cost functions than a simple additive cost adopted in this work. https://www.scienceopen.com/document?vid=ce201f64-ec80-452a-a18d-5eb8bd38b978 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Shohei Higashiyama Blondel Mathieu Kazuhiro Seki Kuniaki Uehara |
spellingShingle |
Shohei Higashiyama Blondel Mathieu Kazuhiro Seki Kuniaki Uehara COST-SENSITIVE STRUCTURED PERCEPTRON INCORPORATING CATEGORY HIERARCHY FOR NAMED ENTITY RECOGNITION Journal of ICT |
author_facet |
Shohei Higashiyama Blondel Mathieu Kazuhiro Seki Kuniaki Uehara |
author_sort |
Shohei Higashiyama |
title |
COST-SENSITIVE STRUCTURED PERCEPTRON INCORPORATING CATEGORY HIERARCHY FOR NAMED ENTITY RECOGNITION |
title_short |
COST-SENSITIVE STRUCTURED PERCEPTRON INCORPORATING CATEGORY HIERARCHY FOR NAMED ENTITY RECOGNITION |
title_full |
COST-SENSITIVE STRUCTURED PERCEPTRON INCORPORATING CATEGORY HIERARCHY FOR NAMED ENTITY RECOGNITION |
title_fullStr |
COST-SENSITIVE STRUCTURED PERCEPTRON INCORPORATING CATEGORY HIERARCHY FOR NAMED ENTITY RECOGNITION |
title_full_unstemmed |
COST-SENSITIVE STRUCTURED PERCEPTRON INCORPORATING CATEGORY HIERARCHY FOR NAMED ENTITY RECOGNITION |
title_sort |
cost-sensitive structured perceptron incorporating category hierarchy for named entity recognition |
publisher |
UUM Press |
series |
Journal of ICT |
issn |
1675-414X |
publishDate |
2015-03-01 |
description |
Named Entity Recognition (NER) is a fundamental natural language processing task for the identifi cation and classifi cation of expressions into predefi ned categories, such as person and organization. Existing NER systems usually target about 10 categories and do not incorporate analysis of category relations. However, categories often belong naturally to some predefi ned hierarchy. In such cases, the distance between categories in the hierarchy becomes a rich source of information that can be exploited. This is intuitively useful particularly when the categories are numerous. On that account, this paper proposes an NER approach that can leverage category hierarchy information by introducing, in the structured perceptron framework, a cost function more strongly penalizing category predictions that are more distant from the correct category in the hierarchy. Experimental results on the GENIA biomedical text corpus indicate the effectiveness of the proposed approach as compared with the case where no cost function is utilized. In addition, the proposed approach demonstrates the superior performance over a representative work using multi-class support vector machines on the same corpus. A possible direction to further improve the proposed approach is to investigate more elaborate cost functions than a simple additive cost adopted in this work. |
url |
https://www.scienceopen.com/document?vid=ce201f64-ec80-452a-a18d-5eb8bd38b978 |
work_keys_str_mv |
AT shoheihigashiyama costsensitivestructuredperceptronincorporatingcategoryhierarchyfornamedentityrecognition AT blondelmathieu costsensitivestructuredperceptronincorporatingcategoryhierarchyfornamedentityrecognition AT kazuhiroseki costsensitivestructuredperceptronincorporatingcategoryhierarchyfornamedentityrecognition AT kuniakiuehara costsensitivestructuredperceptronincorporatingcategoryhierarchyfornamedentityrecognition |
_version_ |
1721224945403428864 |