Simultaneous Removal of Prefix and Suffix

This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to th...

Full description

Bibliographic Details
Main Authors: Pawan Tamta, B. P. Pande
Format: Article
Language:English
Published: World Scientific Publishing 2020-05-01
Series:Vietnam Journal of Computer Science
Subjects:
Online Access:http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500074
Description
Summary:This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N-grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers.
ISSN:2196-8888
2196-8896