Summary: | 碩士 === 國立成功大學 === 資訊工程學系碩博士班 === 90 === This research explores various correlation-based clustering validation methods that are suitable for the gene expression analysis. In biological analysis, the clustering algorithms are often used first to partition the genes into groups exhibiting similar patterns of variation in expression level, then the clustering validation methods are applied to evaluate the validity of the clustering results. However, most of similarity measurements used in existing clustering analysis belong to the distance-based category. In fact, a biologist aims to cluster together genes that have similar expression tendency instead of same expression values. This motivates the use of correlation-based clustering and validation indices in this study.
In this thesis, an automatic clustering validation system was presented to guide the user to choose the suitable validation index in cluster analysis. We developed a volumetric-clouds type clusters generator to synthesize various datasets, and a number of correlation-based validation indices were evaluated for measuring the quality of clustering results. Hence, the system can suggest the best validation index for different types of datasets given by users effectively.
|