The fast vocabulary-based algorithm for natural language word form analysis

In the field of Natural Language Processing, identifying word forms and, more precisely, identifying part-of-speech and grammatical information for each of the words in the input text usually comprises the very first level of text processing (or immediately follows splitting the text into words, sho...

Full description

Bibliographic Details
Main Author: Rozanov Alexey
Format: Article
Language:English
Published: EDP Sciences 2016-01-01
Series:ITM Web of Conferences
Online Access:http://dx.doi.org/10.1051/itmconf/20160603013
Description
Summary:In the field of Natural Language Processing, identifying word forms and, more precisely, identifying part-of-speech and grammatical information for each of the words in the input text usually comprises the very first level of text processing (or immediately follows splitting the text into words, should such task be non-trivial), therefore development of approaches to speed up the word form analysis pose significant interest In (his work, by using the work [1] as a basis, we present an approach to analysis of word forms for natural languages with postfix inflection, following the work done in [3]. We propose a way of representing the postfix inflection rules associated with a natural language and an algorithm for word form analysis based on it. In conclusion, we provide the benchmark data indicating the increase in speed compared to known analysis methods.
ISSN:2271-2097