Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual Data
Good governance was a government whose programs were known and beneficial to the people. In Bali Provincial Government which has duty in disseminating information is Bureau of Public Relations Regional Secretariat Bali through media owned. Because at the time of news input to the media in this case...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universitas Udayana
2018-12-01
|
Series: | Majalah Ilmiah Teknologi Elektro |
Online Access: | https://ojs.unud.ac.id/index.php/JTE/article/view/41047 |
id |
doaj-a64c9a4fecda4bc18fdfa0268e4db181 |
---|---|
record_format |
Article |
spelling |
doaj-a64c9a4fecda4bc18fdfa0268e4db1812020-11-25T02:36:57ZengUniversitas UdayanaMajalah Ilmiah Teknologi Elektro1693-29512503-23722018-12-0117333934410.24843/MITE.2018.v17i03.P0641047Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual DataNyoman Gede YudiartaMade SudarmaWayan Gede AriastinaGood governance was a government whose programs were known and beneficial to the people. In Bali Provincial Government which has duty in disseminating information is Bureau of Public Relations Regional Secretariat Bali through media owned. Because at the time of news input to the media in this case Public Relations Bureau website was not included causing the emergence of problems in the form of difficulty knowing the news, which news that goes into certain categories. Clustering was a method to solve the problem. One of the algorithms used in the Clustering method is the K-Means algorithm. This study focused on designing to classify news data into a category using K-Means. To process the documents obtained to make it easier in the process of clustering, was done by preprocess documents first. Document preparation consists of case folding, tokenization, filtering and stemming. Tf-Idf was done to pass the weighting of the terms obtained on the preprocessed documents. From the results of experiments conducted using different amounts of data that are 50, 100, 200, 300, 400, and 500 data obtained results that the K-Means algorithm applied to cluster news, able to work and provide a satisfactory accuracy, Precision average of 73.11% while Recall of 69.65% and Purity of 0.80 for all test data. When viewed the comparison of each test data, the test on 50 data has the highest average precision and recall rate of 76.92% for its precision and for its recall of 79.58% while for Purity its highest value is on testing 300 data that is equal to 0.83.https://ojs.unud.ac.id/index.php/JTE/article/view/41047 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Nyoman Gede Yudiarta Made Sudarma Wayan Gede Ariastina |
spellingShingle |
Nyoman Gede Yudiarta Made Sudarma Wayan Gede Ariastina Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual Data Majalah Ilmiah Teknologi Elektro |
author_facet |
Nyoman Gede Yudiarta Made Sudarma Wayan Gede Ariastina |
author_sort |
Nyoman Gede Yudiarta |
title |
Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual Data |
title_short |
Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual Data |
title_full |
Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual Data |
title_fullStr |
Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual Data |
title_full_unstemmed |
Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual Data |
title_sort |
penerapan metode clustering text mining untuk pengelompokan berita pada unstructured textual data |
publisher |
Universitas Udayana |
series |
Majalah Ilmiah Teknologi Elektro |
issn |
1693-2951 2503-2372 |
publishDate |
2018-12-01 |
description |
Good governance was a government whose programs were known and beneficial to the people. In Bali Provincial Government which has duty in disseminating information is Bureau of Public Relations Regional Secretariat Bali through media owned. Because at the time of news input to the media in this case Public Relations Bureau website was not included causing the emergence of problems in the form of difficulty knowing the news, which news that goes into certain categories. Clustering was a method to solve the problem. One of the algorithms used in the Clustering method is the K-Means algorithm. This study focused on designing to classify news data into a category using K-Means. To process the documents obtained to make it easier in the process of clustering, was done by preprocess documents first. Document preparation consists of case folding, tokenization, filtering and stemming. Tf-Idf was done to pass the weighting of the terms obtained on the preprocessed documents. From the results of experiments conducted using different amounts of data that are 50, 100, 200, 300, 400, and 500 data obtained results that the K-Means algorithm applied to cluster news, able to work and provide a satisfactory accuracy, Precision average of 73.11% while Recall of 69.65% and Purity of 0.80 for all test data. When viewed the comparison of each test data, the test on 50 data has the highest average precision and recall rate of 76.92% for its precision and for its recall of 79.58% while for Purity its highest value is on testing 300 data that is equal to 0.83. |
url |
https://ojs.unud.ac.id/index.php/JTE/article/view/41047 |
work_keys_str_mv |
AT nyomangedeyudiarta penerapanmetodeclusteringtextmininguntukpengelompokanberitapadaunstructuredtextualdata AT madesudarma penerapanmetodeclusteringtextmininguntukpengelompokanberitapadaunstructuredtextualdata AT wayangedeariastina penerapanmetodeclusteringtextmininguntukpengelompokanberitapadaunstructuredtextualdata |
_version_ |
1724797848147460096 |