A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies

博士 === 國立清華大學 === 工業工程與工程管理學系 === 95 === Support vector machine (SVM), rough set theory (RST) and decision tree (DT) are methodologies applied to various data mining problems, especially for classification prediction tasks. Studies have shown the ability of RST for feature selection while SVM and DT...

Full description

Bibliographic Details
Main Authors: Li-Fei Chen, 陳麗妃
Other Authors: Chen-Fu Chien
Format: Others
Language:en_US
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/30869955008789719497
Description
Summary:博士 === 國立清華大學 === 工業工程與工程管理學系 === 95 === Support vector machine (SVM), rough set theory (RST) and decision tree (DT) are methodologies applied to various data mining problems, especially for classification prediction tasks. Studies have shown the ability of RST for feature selection while SVM and DT are significantly on their predictive power. This research aims to integrate the advantages of SVM, RST and DT approaches to develop a hybrid framework to enhance the quality of class prediction as well as rule generation. In addition to build up a classification model with acceptable accuracy, the capability to explain and explore how the decision made with simple, understandable and useful rules is a critical issue for human resource management. DT and RST can generate such rules, however, SVM can not offer such function. The major concept consists of four main stages. The first stage is to select most important attributes. RST is applied to eliminate the redundant and irrelative attributes without loss of any information about classification. The second stage is to reduce noisy objects, which can be accomplished by cross validation through using SVM. If the new data set would induce data imbalance problem, the rules generated by RST would be used to adjust the class distribution (stage 3). Through the stages described above, a data set with fewer dimensions and higher degree of purity could be screened out with similar class distribution and is used to generate rules by using DT which complete the last stage. In addition, the decisions concern with personnel selection prediction always involve handling data with highly dimensions, uncertainty and complexity, which cause traditional statistical methods suffering from low power of test. For validation, real cases of personnel selection of two high-tech companies containing direct and indirect labors in Hsinchu, Taiwan are studied using the proposed hybrid data mining framework. Implementation results show that the proposed approach is effective and has a better performance than that of traditional SVM, RST and DT.