Features of Distributional Method for Indonesian Word Clustering
We described the results of a study to determine the best features for algorithm EWSB (Extended Word Similarity Based). EWSB is a word clustering algorithm that can be used for all languages with a common feature. We provided four alternative features that can be used for word similarity computation...
Main Author: | |
---|---|
Format: | Article |
Language: | Indonesian |
Published: |
Universitas Tanjungpura
2019-08-01
|
Series: | JEPIN (Jurnal Edukasi dan Penelitian Informatika) |
Subjects: | |
Online Access: | http://jurnal.untan.ac.id/index.php/jepin/article/view/33049 |
id |
doaj-3fbc07128b8b4299b66e88419daa00ff |
---|---|
record_format |
Article |
spelling |
doaj-3fbc07128b8b4299b66e88419daa00ff2020-11-25T02:53:06ZindUniversitas TanjungpuraJEPIN (Jurnal Edukasi dan Penelitian Informatika)2460-07412548-93642019-08-015216417010.26418/jp.v5i2.3304925783Features of Distributional Method for Indonesian Word ClusteringHerry Sujaini0Universitas TanjungpuraWe described the results of a study to determine the best features for algorithm EWSB (Extended Word Similarity Based). EWSB is a word clustering algorithm that can be used for all languages with a common feature. We provided four alternative features that can be used for word similarity computation and experimented toward the Indonesian Language to determine the best feature format for the language. We found that the best feature used in the algorithm to Indonesian EWSB is t w w' format (3-gram) with 0 (zero) word relation. Moreover, we found that using 3-gram is better than 4-gram for all the proposed features. Average recall of 3-gram is 83.50%, while the average 4-gram recall is 57.25%.http://jurnal.untan.ac.id/index.php/jepin/article/view/33049n-gramword clusteringword similarityewsb |
collection |
DOAJ |
language |
Indonesian |
format |
Article |
sources |
DOAJ |
author |
Herry Sujaini |
spellingShingle |
Herry Sujaini Features of Distributional Method for Indonesian Word Clustering JEPIN (Jurnal Edukasi dan Penelitian Informatika) n-gram word clustering word similarity ewsb |
author_facet |
Herry Sujaini |
author_sort |
Herry Sujaini |
title |
Features of Distributional Method for Indonesian Word Clustering |
title_short |
Features of Distributional Method for Indonesian Word Clustering |
title_full |
Features of Distributional Method for Indonesian Word Clustering |
title_fullStr |
Features of Distributional Method for Indonesian Word Clustering |
title_full_unstemmed |
Features of Distributional Method for Indonesian Word Clustering |
title_sort |
features of distributional method for indonesian word clustering |
publisher |
Universitas Tanjungpura |
series |
JEPIN (Jurnal Edukasi dan Penelitian Informatika) |
issn |
2460-0741 2548-9364 |
publishDate |
2019-08-01 |
description |
We described the results of a study to determine the best features for algorithm EWSB (Extended Word Similarity Based). EWSB is a word clustering algorithm that can be used for all languages with a common feature. We provided four alternative features that can be used for word similarity computation and experimented toward the Indonesian Language to determine the best feature format for the language. We found that the best feature used in the algorithm to Indonesian EWSB is t w w' format (3-gram) with 0 (zero) word relation. Moreover, we found that using 3-gram is better than 4-gram for all the proposed features. Average recall of 3-gram is 83.50%, while the average 4-gram recall is 57.25%. |
topic |
n-gram word clustering word similarity ewsb |
url |
http://jurnal.untan.ac.id/index.php/jepin/article/view/33049 |
work_keys_str_mv |
AT herrysujaini featuresofdistributionalmethodforindonesianwordclustering |
_version_ |
1724726769280352256 |