Integrating Clustering Analysis with Granular Computing for Imbalanced Data Classification Problem─A Case Study on Prostate Cancer Prognosis

碩士 === 國立臺灣科技大學 === 工業管理系 === 103 === This study aims to deal with the class imbalance problem by using the concept of Information Granulation (IG). Majority classes of data are assembled into granules to balance the ratio of classes within data. This process can reduce the risk of critical informat...

Full description

Bibliographic Details
Main Authors: PO-YU SU, 蘇柏瑜
Other Authors: Ren-Jieh Kuo
Format: Others
Language:en_US
Published: 2015
Online Access:http://ndltd.ncl.edu.tw/handle/ps3nj5
Description
Summary:碩士 === 國立臺灣科技大學 === 工業管理系 === 103 === This study aims to deal with the class imbalance problem by using the concept of Information Granulation (IG). Majority classes of data are assembled into granules to balance the ratio of classes within data. This process can reduce the risk of critical information being diluted by large numbers of relatively unimportant data and noises. Three clustering techniques, dynamic clustering using particle swarm optimization (DCPSO), genetic algorithm K-means (GA K-means), and artificial bee colony K-means (ABC K-means) are implemented to construct information granules. Thus, three granular computing (GrC) models are proposed in this study in order to solve the problem of class imbalance. At the end of the procedure, classifiers are applied to construct the classification models for each data. With the help of benchmark data sets on UCI Machine Learning Repository, the effectiveness of proposed GrC models have been evaluated. Since the proposed models have the ability to produce solid results of classification, real world data for survival length of patients with prostate cancer were used implemented to construct a prognosis system. The classification results are also very promising. The results indicate that the proposed GrC models are capable of reducing the difficulties of classification for imbalanced data. Furthermore, the proposed GrC models truly help raise the accuracies of minorities and most of the overall accuracies. Computational results of prostate cancer prognosis give the doctors better information and analysis for the patients’ survival conditions of prostate cancer.