Penerapan Metode Clustering Text Mining Untuk Pengelompokan Berita Pada Unstructured Textual Data

Good governance was a government whose programs were known and beneficial to the people. In Bali Provincial Government which has duty in disseminating information is Bureau of Public Relations Regional Secretariat Bali through media owned. Because at the time of news input to the media in this case...

Full description

Bibliographic Details
Main Authors: Nyoman Gede Yudiarta, Made Sudarma, Wayan Gede Ariastina
Format: Article
Language:English
Published: Universitas Udayana 2018-12-01
Series:Majalah Ilmiah Teknologi Elektro
Online Access:https://ojs.unud.ac.id/index.php/JTE/article/view/41047
Description
Summary:Good governance was a government whose programs were known and beneficial to the people. In Bali Provincial Government which has duty in disseminating information is Bureau of Public Relations Regional Secretariat Bali through media owned. Because at the time of news input to the media in this case Public Relations Bureau website was not included causing the emergence of problems in the form of difficulty knowing the news, which news that goes into certain categories. Clustering was a method to solve the problem. One of the algorithms used in the Clustering method is the K-Means algorithm. This study focused on designing to classify news data into a category using K-Means. To process the documents obtained to make it easier in the process of clustering, was done by preprocess documents first. Document preparation consists of case folding, tokenization, filtering and stemming. Tf-Idf was done to pass the weighting of the terms obtained on the preprocessed documents. From the results of experiments conducted using different amounts of data that are 50, 100, 200, 300, 400, and 500 data obtained results that the K-Means algorithm applied to cluster news, able to work and provide a satisfactory accuracy, Precision average of 73.11% while Recall of 69.65% and Purity of 0.80 for all test data. When viewed the comparison of each test data, the test on 50 data has the highest average precision and recall rate of 76.92% for its precision and for its recall of 79.58% while for Purity its highest value is on testing 300 data that is equal to 0.83.
ISSN:1693-2951
2503-2372