Simultaneous Removal of Prefix and Suffix

This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to th...

Full description

Bibliographic Details
Main Authors: Pawan Tamta, B. P. Pande
Format: Article
Language:English
Published: World Scientific Publishing 2020-05-01
Series:Vietnam Journal of Computer Science
Subjects:
Online Access:http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500074
id doaj-fe8d16a901ab486da5b813401777be2b
record_format Article
spelling doaj-fe8d16a901ab486da5b813401777be2b2020-11-25T03:06:12ZengWorld Scientific PublishingVietnam Journal of Computer Science2196-88882196-88962020-05-017212914410.1142/S219688882050007410.1142/S2196888820500074Simultaneous Removal of Prefix and SuffixPawan Tamta0B. P. Pande1Department of Mathematics, Government P.G. College, Berinag, Pithoragarh, Uttarakhand 262531, IndiaDepartment of Computer Science, Kumaun University, SSJ Campus Almora 263601, IndiaThis work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N-grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers.http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500074information retrieval (ir)stemmerconflationn-grampotential-stem
collection DOAJ
language English
format Article
sources DOAJ
author Pawan Tamta
B. P. Pande
spellingShingle Pawan Tamta
B. P. Pande
Simultaneous Removal of Prefix and Suffix
Vietnam Journal of Computer Science
information retrieval (ir)
stemmer
conflation
n-gram
potential-stem
author_facet Pawan Tamta
B. P. Pande
author_sort Pawan Tamta
title Simultaneous Removal of Prefix and Suffix
title_short Simultaneous Removal of Prefix and Suffix
title_full Simultaneous Removal of Prefix and Suffix
title_fullStr Simultaneous Removal of Prefix and Suffix
title_full_unstemmed Simultaneous Removal of Prefix and Suffix
title_sort simultaneous removal of prefix and suffix
publisher World Scientific Publishing
series Vietnam Journal of Computer Science
issn 2196-8888
2196-8896
publishDate 2020-05-01
description This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N-grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers.
topic information retrieval (ir)
stemmer
conflation
n-gram
potential-stem
url http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500074
work_keys_str_mv AT pawantamta simultaneousremovalofprefixandsuffix
AT bppande simultaneousremovalofprefixandsuffix
_version_ 1724674616653250560