A Density-based Multistage Clustering Algorithm

碩士 === 國立臺灣科技大學 === 資訊管理系 === 98 ===   With the increase of the e-commerce in recent years, large amounts of enterprises start to invest in computerization and thus generate a tremendous amount of data. For the managers, it will be a great benefit for the enterprise if useful information can be extr...

Full description

Bibliographic Details
Main Authors: Chun-hao Chuang, 莊峻豪
Other Authors: Chiun-chieh Hsu
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/37994188459093532207
Description
Summary:碩士 === 國立臺灣科技大學 === 資訊管理系 === 98 ===   With the increase of the e-commerce in recent years, large amounts of enterprises start to invest in computerization and thus generate a tremendous amount of data. For the managers, it will be a great benefit for the enterprise if useful information can be extracted from these raw data. Therefore, data mining has become one of important and popular research domains.   Clustering algorithms can recognize and partition data according to their attributes’ characteristics without defining any categorization information in advance. Therefore, clustering algorithms play an important role in data mining, where the goal is to maximize the homogeneity of objects within the clusters while also maximize the heterogeneity between clusters. Fuzzy C-Means is one of the most popular clustering algorithms, where the number of clusters should be given in advance. Even though we assign the number of clusters, the result of clustering may fall into the local optimum. In addition, because the initial cluster centers are determined by random, the result of each execution may be different. Furthermore, if the data contains noise, it will induce more significant impact on the results, so it is important to suitably select the initial cluster centers. Although the influence of randomly selecting centers can be reduced if the subtractive method is adopted, it is still difficult to deal with the non-spherical shape clusters. If we use the FCM algorithm with the hierarchical algorithm, it can deal with the non-spherical shape clusters, but it cannot easily handle noise.   Hence, this paper proposes a density-based multistage clustering algorithm, combining with subtractive clustering, fuzzy clustering and hierarchical clustering methods for solving the problems mentioned above. There are three stages in the approach. The first stage is to suitably select the initial cluster centers; the second stage is to modify the distribution of data points; and the third stage is to merge clusters until appropriate number of clusters is achieved. The experimental results show that our proposed method can improve the performance of clustering.