Simultaneous Removal of Prefix and Suffix

This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to th...

Full description

Bibliographic Details
Main Authors:	Pawan Tamta, B. P. Pande
Format:	Article
Language:	English
Published:	World Scientific Publishing 2020-05-01
Series:	Vietnam Journal of Computer Science
Subjects:	information retrieval (ir) stemmer conflation n-gram potential-stem
Online Access:	http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500074

id	doaj-fe8d16a901ab486da5b813401777be2b
record_format	Article
spelling	doaj-fe8d16a901ab486da5b813401777be2b2020-11-25T03:06:12ZengWorld Scientific PublishingVietnam Journal of Computer Science2196-88882196-88962020-05-017212914410.1142/S219688882050007410.1142/S2196888820500074Simultaneous Removal of Prefix and SuffixPawan Tamta0B. P. Pande1Department of Mathematics, Government P.G. College, Berinag, Pithoragarh, Uttarakhand 262531, IndiaDepartment of Computer Science, Kumaun University, SSJ Campus Almora 263601, IndiaThis work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N-grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers.http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500074information retrieval (ir)stemmerconflationn-grampotential-stem
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Pawan Tamta B. P. Pande
spellingShingle	Pawan Tamta B. P. Pande Simultaneous Removal of Prefix and Suffix Vietnam Journal of Computer Science information retrieval (ir) stemmer conflation n-gram potential-stem
author_facet	Pawan Tamta B. P. Pande
author_sort	Pawan Tamta
title	Simultaneous Removal of Prefix and Suffix
title_short	Simultaneous Removal of Prefix and Suffix
title_full	Simultaneous Removal of Prefix and Suffix
title_fullStr	Simultaneous Removal of Prefix and Suffix
title_full_unstemmed	Simultaneous Removal of Prefix and Suffix
title_sort	simultaneous removal of prefix and suffix
publisher	World Scientific Publishing
series	Vietnam Journal of Computer Science
issn	2196-8888 2196-8896
publishDate	2020-05-01
description	This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N-grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers.
topic	information retrieval (ir) stemmer conflation n-gram potential-stem
url	http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500074
work_keys_str_mv	AT pawantamta simultaneousremovalofprefixandsuffix AT bppande simultaneousremovalofprefixandsuffix
_version_	1724674616653250560

Simultaneous Removal of Prefix and Suffix

Similar Items