Refinement After Density-based Clustering on Dirty Data

碩士 === 國立臺灣科技大學 === 資訊工程系 === 106 === Clustering algorithms are efficient for the task of class identification in spatial databases. Noise after clustering sometimes is meaningful due to mistake by inappropriate parameters setting or environmental factor in collecting data, we call them “dirty data”...

Full description

Bibliographic Details
Main Authors: Kuei-Hsin Liang, 梁珪信
Other Authors: Bi-Ru Dai
Format: Others
Language:en_US
Published: 2018
Online Access:http://ndltd.ncl.edu.tw/handle/epsyrq
Description
Summary:碩士 === 國立臺灣科技大學 === 資訊工程系 === 106 === Clustering algorithms are efficient for the task of class identification in spatial databases. Noise after clustering sometimes is meaningful due to mistake by inappropriate parameters setting or environmental factor in collecting data, we call them “dirty data”. Removal of these noise methods loss considerable information because ignoring dirty data which in some part is meaningful. In this paper, we present a method to refine the result of density-based clustering which two parameters assist our proposed definition complete. We performed kinds of experimental evaluation of effectiveness of refinement cooperating with DBSCAN, most famous density-based clustering algorithm in various application, called RaC-DBSCAN in synthetic dataset, UCI dataset and real dataset. The results of our experiments demonstrate that RaC-DBSCAN no matter enhance precision of identify each cluster but also generate potential by further utilize dirty data.