Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features

Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper a...

Full description

Bibliographic Details
Main Authors: SaiKiranmai Gorla, Lalita Bhanu Murthy Neti, Aruna Malapati
Format: Article
Language:English
Published: MDPI AG 2020-02-01
Series:Information
Subjects:
Online Access:https://www.mdpi.com/2078-2489/11/2/82
id doaj-d6ae07b19393401b894d43ecceeba9f4
record_format Article
spelling doaj-d6ae07b19393401b894d43ecceeba9f42020-11-25T03:32:57ZengMDPI AGInformation2078-24892020-02-011128210.3390/info11020082info11020082Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer FeaturesSaiKiranmai Gorla0Lalita Bhanu Murthy Neti1Aruna Malapati2Department of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Hyderabad Campus, Telangana 500078, IndiaDepartment of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Hyderabad Campus, Telangana 500078, IndiaDepartment of Computer Science and Information Systems, Birla Institute of Technology and Science Pilani, Hyderabad Campus, Telangana 500078, IndiaNamed entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.https://www.mdpi.com/2078-2489/11/2/82information extractionnamed entity recognitiontelugu languagegazetteersupport vector machineconditional random fieldmargin infused relaxed algorithm
collection DOAJ
language English
format Article
sources DOAJ
author SaiKiranmai Gorla
Lalita Bhanu Murthy Neti
Aruna Malapati
spellingShingle SaiKiranmai Gorla
Lalita Bhanu Murthy Neti
Aruna Malapati
Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features
Information
information extraction
named entity recognition
telugu language
gazetteer
support vector machine
conditional random field
margin infused relaxed algorithm
author_facet SaiKiranmai Gorla
Lalita Bhanu Murthy Neti
Aruna Malapati
author_sort SaiKiranmai Gorla
title Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features
title_short Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features
title_full Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features
title_fullStr Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features
title_full_unstemmed Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features
title_sort enhancing the performance of telugu named entity recognition using gazetteer features
publisher MDPI AG
series Information
issn 2078-2489
publishDate 2020-02-01
description Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.
topic information extraction
named entity recognition
telugu language
gazetteer
support vector machine
conditional random field
margin infused relaxed algorithm
url https://www.mdpi.com/2078-2489/11/2/82
work_keys_str_mv AT saikiranmaigorla enhancingtheperformanceoftelugunamedentityrecognitionusinggazetteerfeatures
AT lalitabhanumurthyneti enhancingtheperformanceoftelugunamedentityrecognitionusinggazetteerfeatures
AT arunamalapati enhancingtheperformanceoftelugunamedentityrecognitionusinggazetteerfeatures
_version_ 1724565709974929408