Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features
Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper a...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2020-02-01
|
Series: | Information |
Subjects: | |
Online Access: | https://www.mdpi.com/2078-2489/11/2/82 |
id |
doaj-d6ae07b19393401b894d43ecceeba9f4 |
---|---|
record_format |
Article |
spelling |
doaj-d6ae07b19393401b894d43ecceeba9f42020-11-25T03:32:57ZengMDPI AGInformation2078-24892020-02-011128210.3390/info11020082info11020082Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer FeaturesSaiKiranmai Gorla0Lalita Bhanu Murthy Neti1Aruna Malapati2Department of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Hyderabad Campus, Telangana 500078, IndiaDepartment of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Hyderabad Campus, Telangana 500078, IndiaDepartment of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Hyderabad Campus, Telangana 500078, IndiaNamed entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.https://www.mdpi.com/2078-2489/11/2/82information extractionnamed entity recognitiontelugu languagegazetteersupport vector machineconditional random fieldmargin infused relaxed algorithm |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
SaiKiranmai Gorla Lalita Bhanu Murthy Neti Aruna Malapati |
spellingShingle |
SaiKiranmai Gorla Lalita Bhanu Murthy Neti Aruna Malapati Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features Information information extraction named entity recognition telugu language gazetteer support vector machine conditional random field margin infused relaxed algorithm |
author_facet |
SaiKiranmai Gorla Lalita Bhanu Murthy Neti Aruna Malapati |
author_sort |
SaiKiranmai Gorla |
title |
Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features |
title_short |
Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features |
title_full |
Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features |
title_fullStr |
Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features |
title_full_unstemmed |
Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features |
title_sort |
enhancing the performance of telugu named entity recognition using gazetteer features |
publisher |
MDPI AG |
series |
Information |
issn |
2078-2489 |
publishDate |
2020-02-01 |
description |
Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF. |
topic |
information extraction named entity recognition telugu language gazetteer support vector machine conditional random field margin infused relaxed algorithm |
url |
https://www.mdpi.com/2078-2489/11/2/82 |
work_keys_str_mv |
AT saikiranmaigorla enhancingtheperformanceoftelugunamedentityrecognitionusinggazetteerfeatures AT lalitabhanumurthyneti enhancingtheperformanceoftelugunamedentityrecognitionusinggazetteerfeatures AT arunamalapati enhancingtheperformanceoftelugunamedentityrecognitionusinggazetteerfeatures |
_version_ |
1724565709974929408 |