Simultaneous Removal of Prefix and Suffix
This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to th...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
World Scientific Publishing
2020-05-01
|
Series: | Vietnam Journal of Computer Science |
Subjects: | |
Online Access: | http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500074 |
id |
doaj-fe8d16a901ab486da5b813401777be2b |
---|---|
record_format |
Article |
spelling |
doaj-fe8d16a901ab486da5b813401777be2b2020-11-25T03:06:12ZengWorld Scientific PublishingVietnam Journal of Computer Science2196-88882196-88962020-05-017212914410.1142/S219688882050007410.1142/S2196888820500074Simultaneous Removal of Prefix and SuffixPawan Tamta0B. P. Pande1Department of Mathematics, Government P.G. College, Berinag, Pithoragarh, Uttarakhand 262531, IndiaDepartment of Computer Science, Kumaun University, SSJ Campus Almora 263601, IndiaThis work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N-grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers.http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500074information retrieval (ir)stemmerconflationn-grampotential-stem |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Pawan Tamta B. P. Pande |
spellingShingle |
Pawan Tamta B. P. Pande Simultaneous Removal of Prefix and Suffix Vietnam Journal of Computer Science information retrieval (ir) stemmer conflation n-gram potential-stem |
author_facet |
Pawan Tamta B. P. Pande |
author_sort |
Pawan Tamta |
title |
Simultaneous Removal of Prefix and Suffix |
title_short |
Simultaneous Removal of Prefix and Suffix |
title_full |
Simultaneous Removal of Prefix and Suffix |
title_fullStr |
Simultaneous Removal of Prefix and Suffix |
title_full_unstemmed |
Simultaneous Removal of Prefix and Suffix |
title_sort |
simultaneous removal of prefix and suffix |
publisher |
World Scientific Publishing |
series |
Vietnam Journal of Computer Science |
issn |
2196-8888 2196-8896 |
publishDate |
2020-05-01 |
description |
This work is an attempt to devise a Stemmer that can remove both prefix and suffix together from a given word in English language. For a given input word, our method considers all possible internal N-grams for detection of potential stems. We frame a hypothesis where the stem length is closest to the half of the length of the input word. A standard English dictionary has been employed to identify morphologically correct N-grams in the process. We apply our techniques over a random sample of 100 English words, each possessing both prefix and suffix. We also compare our proposed Stemmer with three standard algorithms from the literature. Empirical results exhibit that our technique performs better than the rest of the stemmers. |
topic |
information retrieval (ir) stemmer conflation n-gram potential-stem |
url |
http://www.worldscientific.com/doi/pdf/10.1142/S2196888820500074 |
work_keys_str_mv |
AT pawantamta simultaneousremovalofprefixandsuffix AT bppande simultaneousremovalofprefixandsuffix |
_version_ |
1724674616653250560 |