Summary: | 碩士 === 國立臺灣科技大學 === 工業管理系 === 101 === Due to the model assumption, the traditional statistical methods such as multivariate analysis of variance (MANOVA) and Canonical Correlation Analysis (CCA) have the limitation on analyze the complicated dataset in the real world nowadays. Applying data mining techniques such as clustering and classification algorithms are promising to reveal and analyze the multiple-attribute dataset. In this research, a framework integrating clustering and classification which are applied on different datasets: numerical measures (Q dataset) and categorical feature (X dataset), respectively, was proposed. The clustering method is expected to help on rapidly analyzing or identifying the numerical measures (Q dataset). The clustering results, labels, are then combined with X dataset as the inputs of the classification model which classifies the clustering labels by using X dataset. In this research, hierarchical clustering and Classification and Regression Tree (CART) are used to present clustering and classification methods, respectively, based on the their tree structure characteristic. In order to maintain the balanced performance of clustering and classification learning simultaneously, Clustering Classification Evaluation plot (CCE) plot was proposed to show performance measures of both clustering and classification results together. Here, clustering quality is measured by using complimentary sum squared of error (〖SSE〗_com) and classification performance is measured by the accuracy of prediction. Several real life datasets are used to evaluate the proposed framework. The results shows that CCE plots can be used to determine the number of clusters which is an important parameter affecting the performance of the propose framework.
|