Summary: | 博士 === 元智大學 === 工業工程與管理學系 === 98 === Given a dataset consisting of numerous objects, clustering aims at partitioning the dataset into several homogeneous clusters based on the similarities/dissimilarities among objects. Partitional clustering methods are the most popular and widespread clustering methods due to their computational efficiency and easy manipulation. However, users still encounter several unavoidable challenges when employing partitional clustering methods, including specifying a proper dissimilarity measure, setting laden parameters, and obtaining the result sensitive to initialization of cluster centers.
In this dissertation, a novel partitional clustering method entitled Weighting Agglomerative All-means clustering algorithm (WAA-means) is proposed to simultaneously fulfill these requisitions. The elementary idea of WAA-means is derived from how to move object locations for effective clustering based on the concept of gravitational force. Each object in a dataset is regarded as an independent individual. For any two objects, there is an attraction existing between them and the attraction activates them to be close to each other. Anytime each object should be affected by different attractions from other objects simultaneously, so that it accordingly moves to a new location determined based on the combination of these attractions. The new location of an object will be deeply influenced by the location of another object similar to this object because the attraction between them is large. As time goes by, the objects similar to each other will progressively move to approach together and eventually agglomerate at a single location. Through the WAA-means algorithm, consequently, all objects initially distributed around are placed at a few agglomerated locations finally. At the moment, each object will not move any more because it not only has the maximal attractions with the objects at the same agglomerated location but also has extremely weak attractions with the objects at different agglomerated locations. Therefore, the objects that will be agglomerated at an identical location can be classified to the same cluster. The number of clusters also can be determined by counting the number of final agglomerated locations, rather than being assigned before performing the WAA-means algorithm.
The details about the WAA-means algorithm are elucidated, including defining the attraction between two objects, developing a mechanism to measure the influence weights of attractions that act on an object, expounding the whole computational procedures of WAA-means, and analyzing the convergence and robustness of WAA-means. Furthermore, with the advancement of information technology, the fast-growing, tremendous amount of data has been collected in many application domains over past several decades. For maintaining the efficiency of the proposed WAA-means algorithm when handling such a large dataset, a data summarization procedure is presented in order to reduce the data size of inputs of the WAA-means algorithm. Finally, we will apply the WAA-means algorithm to two practical applications, including case-based reasoning and gene microarray biclustering, for improving their performances.
|