Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features

Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper a...

Full description

Bibliographic Details
Main Authors:	SaiKiranmai Gorla, Lalita Bhanu Murthy Neti, Aruna Malapati
Format:	Article
Language:	English
Published:	MDPI AG 2020-02-01
Series:	Information
Subjects:	information extraction named entity recognition telugu language gazetteer support vector machine conditional random field margin infused relaxed algorithm
Online Access:	https://www.mdpi.com/2078-2489/11/2/82

Description
Summary:	Named entity recognition (NER) is a fundamental step for many natural language processing tasks and hence enhancing the performance of NER models is always appreciated. With limited resources being available, NER for South-East Asian languages like Telugu is quite a challenging problem. This paper attempts to improve the NER performance for Telugu using gazetteer-related features, which are automatically generated using Wikipedia pages. We make use of these gazetteer features along with other well-known features like contextual, word-level, and corpus features to build NER models. NER models are developed using three well-known classifiers—conditional random field (CRF), support vector machine (SVM), and margin infused relaxed algorithms (MIRA). The gazetteer features are shown to improve the performance, and theMIRA-based NER model fared better than its counterparts SVM and CRF.
ISSN:	2078-2489

Enhancing the Performance of Telugu Named Entity Recognition Using Gazetteer Features

Similar Items