Improving DPC with Hierarchical Clustering

碩士 === 元智大學 === 資訊管理學系 === 107 === 【Background】From the big data to machine learning, cluster analysis plays a very important role in this field. Although it has been developed for decades, there are still many issues to be challenged.【Objective】In 2014, the new algorithm, DPC, was released, which h...

Full description

Bibliographic Details
Main Authors: Chun-Chieh Hsu, 徐俊傑
Other Authors: Jun-Lin Lin
Format: Others
Language:zh-TW
Published: 2019
Online Access:http://ndltd.ncl.edu.tw/handle/58n594
Description
Summary:碩士 === 元智大學 === 資訊管理學系 === 107 === 【Background】From the big data to machine learning, cluster analysis plays a very important role in this field. Although it has been developed for decades, there are still many issues to be challenged.【Objective】In 2014, the new algorithm, DPC, was released, which has the advantages of simplicity and speed, but there are two shortcomings. 1. Low-density clusters must be attached to the nearest high-density cluster, and the possibility of clustering errors will occur. Second, the cluster center needs to be determined by the users, and will have different results due to personal subjective differences.【Methods】The improvement method proposed in this study is as follows: First, re-merging the clusters in a hierarchical method by pre-seeking the selected candidates; Second, the parameter of the number of clusters is directly given by the users. This improved algorithm is called HDPC.【Results】DPC and HDPC were compared through widely-used two-dimensional synthetic datasets, and the performance of the two was observed under two different definitions of neighborhoods using a cut-off threshold on a Gauss kernel. HDPC is outstanding in Bridge-like connections, in Clusters of varying density and linear spiral datasets.【Conclusions】Although HDPC is not suitable for bridge dataset, it is capable of handling datasets of varying density with a Cut-off neighborhood, setting the number of candidates too large or too small could result in poor performance. Finally, the HDPC combined with the Hierarchical method requires a lot of time to process and is the focus of future improvements.