A Density-based Multistage Clustering Algorithm

碩士 === 國立臺灣科技大學 === 資訊管理系 === 98 ===   With the increase of the e-commerce in recent years, large amounts of enterprises start to invest in computerization and thus generate a tremendous amount of data. For the managers, it will be a great benefit for the enterprise if useful information can be extr...

Full description

Bibliographic Details
Main Authors: Chun-hao Chuang, 莊峻豪
Other Authors: Chiun-chieh Hsu
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/37994188459093532207
id ndltd-TW-098NTUS5396042
record_format oai_dc
spelling ndltd-TW-098NTUS53960422016-04-22T04:23:45Z http://ndltd.ncl.edu.tw/handle/37994188459093532207 A Density-based Multistage Clustering Algorithm 以資料密度為基礎的多階段分群演算法 Chun-hao Chuang 莊峻豪 碩士 國立臺灣科技大學 資訊管理系 98   With the increase of the e-commerce in recent years, large amounts of enterprises start to invest in computerization and thus generate a tremendous amount of data. For the managers, it will be a great benefit for the enterprise if useful information can be extracted from these raw data. Therefore, data mining has become one of important and popular research domains.   Clustering algorithms can recognize and partition data according to their attributes’ characteristics without defining any categorization information in advance. Therefore, clustering algorithms play an important role in data mining, where the goal is to maximize the homogeneity of objects within the clusters while also maximize the heterogeneity between clusters. Fuzzy C-Means is one of the most popular clustering algorithms, where the number of clusters should be given in advance. Even though we assign the number of clusters, the result of clustering may fall into the local optimum. In addition, because the initial cluster centers are determined by random, the result of each execution may be different. Furthermore, if the data contains noise, it will induce more significant impact on the results, so it is important to suitably select the initial cluster centers. Although the influence of randomly selecting centers can be reduced if the subtractive method is adopted, it is still difficult to deal with the non-spherical shape clusters. If we use the FCM algorithm with the hierarchical algorithm, it can deal with the non-spherical shape clusters, but it cannot easily handle noise.   Hence, this paper proposes a density-based multistage clustering algorithm, combining with subtractive clustering, fuzzy clustering and hierarchical clustering methods for solving the problems mentioned above. There are three stages in the approach. The first stage is to suitably select the initial cluster centers; the second stage is to modify the distribution of data points; and the third stage is to merge clusters until appropriate number of clusters is achieved. The experimental results show that our proposed method can improve the performance of clustering. Chiun-chieh Hsu 徐俊傑 2010 學位論文 ; thesis 54 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊管理系 === 98 ===   With the increase of the e-commerce in recent years, large amounts of enterprises start to invest in computerization and thus generate a tremendous amount of data. For the managers, it will be a great benefit for the enterprise if useful information can be extracted from these raw data. Therefore, data mining has become one of important and popular research domains.   Clustering algorithms can recognize and partition data according to their attributes’ characteristics without defining any categorization information in advance. Therefore, clustering algorithms play an important role in data mining, where the goal is to maximize the homogeneity of objects within the clusters while also maximize the heterogeneity between clusters. Fuzzy C-Means is one of the most popular clustering algorithms, where the number of clusters should be given in advance. Even though we assign the number of clusters, the result of clustering may fall into the local optimum. In addition, because the initial cluster centers are determined by random, the result of each execution may be different. Furthermore, if the data contains noise, it will induce more significant impact on the results, so it is important to suitably select the initial cluster centers. Although the influence of randomly selecting centers can be reduced if the subtractive method is adopted, it is still difficult to deal with the non-spherical shape clusters. If we use the FCM algorithm with the hierarchical algorithm, it can deal with the non-spherical shape clusters, but it cannot easily handle noise.   Hence, this paper proposes a density-based multistage clustering algorithm, combining with subtractive clustering, fuzzy clustering and hierarchical clustering methods for solving the problems mentioned above. There are three stages in the approach. The first stage is to suitably select the initial cluster centers; the second stage is to modify the distribution of data points; and the third stage is to merge clusters until appropriate number of clusters is achieved. The experimental results show that our proposed method can improve the performance of clustering.
author2 Chiun-chieh Hsu
author_facet Chiun-chieh Hsu
Chun-hao Chuang
莊峻豪
author Chun-hao Chuang
莊峻豪
spellingShingle Chun-hao Chuang
莊峻豪
A Density-based Multistage Clustering Algorithm
author_sort Chun-hao Chuang
title A Density-based Multistage Clustering Algorithm
title_short A Density-based Multistage Clustering Algorithm
title_full A Density-based Multistage Clustering Algorithm
title_fullStr A Density-based Multistage Clustering Algorithm
title_full_unstemmed A Density-based Multistage Clustering Algorithm
title_sort density-based multistage clustering algorithm
publishDate 2010
url http://ndltd.ncl.edu.tw/handle/37994188459093532207
work_keys_str_mv AT chunhaochuang adensitybasedmultistageclusteringalgorithm
AT zhuāngjùnháo adensitybasedmultistageclusteringalgorithm
AT chunhaochuang yǐzīliàomìdùwèijīchǔdeduōjiēduànfēnqúnyǎnsuànfǎ
AT zhuāngjùnháo yǐzīliàomìdùwèijīchǔdeduōjiēduànfēnqúnyǎnsuànfǎ
AT chunhaochuang densitybasedmultistageclusteringalgorithm
AT zhuāngjùnháo densitybasedmultistageclusteringalgorithm
_version_ 1718231260441083904