A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies

博士 === 國立清華大學 === 工業工程與工程管理學系 === 95 === Support vector machine (SVM), rough set theory (RST) and decision tree (DT) are methodologies applied to various data mining problems, especially for classification prediction tasks. Studies have shown the ability of RST for feature selection while SVM and DT...

Full description

Bibliographic Details
Main Authors: Li-Fei Chen, 陳麗妃
Other Authors: Chen-Fu Chien
Format: Others
Language:en_US
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/30869955008789719497
id ndltd-TW-095NTHU5031009
record_format oai_dc
spelling ndltd-TW-095NTHU50310092016-05-25T04:14:03Z http://ndltd.ncl.edu.tw/handle/30869955008789719497 A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies 整合約略集合論、支援向量機與決策樹之資料挖礦架構及其個案研究 Li-Fei Chen 陳麗妃 博士 國立清華大學 工業工程與工程管理學系 95 Support vector machine (SVM), rough set theory (RST) and decision tree (DT) are methodologies applied to various data mining problems, especially for classification prediction tasks. Studies have shown the ability of RST for feature selection while SVM and DT are significantly on their predictive power. This research aims to integrate the advantages of SVM, RST and DT approaches to develop a hybrid framework to enhance the quality of class prediction as well as rule generation. In addition to build up a classification model with acceptable accuracy, the capability to explain and explore how the decision made with simple, understandable and useful rules is a critical issue for human resource management. DT and RST can generate such rules, however, SVM can not offer such function. The major concept consists of four main stages. The first stage is to select most important attributes. RST is applied to eliminate the redundant and irrelative attributes without loss of any information about classification. The second stage is to reduce noisy objects, which can be accomplished by cross validation through using SVM. If the new data set would induce data imbalance problem, the rules generated by RST would be used to adjust the class distribution (stage 3). Through the stages described above, a data set with fewer dimensions and higher degree of purity could be screened out with similar class distribution and is used to generate rules by using DT which complete the last stage. In addition, the decisions concern with personnel selection prediction always involve handling data with highly dimensions, uncertainty and complexity, which cause traditional statistical methods suffering from low power of test. For validation, real cases of personnel selection of two high-tech companies containing direct and indirect labors in Hsinchu, Taiwan are studied using the proposed hybrid data mining framework. Implementation results show that the proposed approach is effective and has a better performance than that of traditional SVM, RST and DT. Chen-Fu Chien 簡禎富 2007 學位論文 ; thesis 136 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 博士 === 國立清華大學 === 工業工程與工程管理學系 === 95 === Support vector machine (SVM), rough set theory (RST) and decision tree (DT) are methodologies applied to various data mining problems, especially for classification prediction tasks. Studies have shown the ability of RST for feature selection while SVM and DT are significantly on their predictive power. This research aims to integrate the advantages of SVM, RST and DT approaches to develop a hybrid framework to enhance the quality of class prediction as well as rule generation. In addition to build up a classification model with acceptable accuracy, the capability to explain and explore how the decision made with simple, understandable and useful rules is a critical issue for human resource management. DT and RST can generate such rules, however, SVM can not offer such function. The major concept consists of four main stages. The first stage is to select most important attributes. RST is applied to eliminate the redundant and irrelative attributes without loss of any information about classification. The second stage is to reduce noisy objects, which can be accomplished by cross validation through using SVM. If the new data set would induce data imbalance problem, the rules generated by RST would be used to adjust the class distribution (stage 3). Through the stages described above, a data set with fewer dimensions and higher degree of purity could be screened out with similar class distribution and is used to generate rules by using DT which complete the last stage. In addition, the decisions concern with personnel selection prediction always involve handling data with highly dimensions, uncertainty and complexity, which cause traditional statistical methods suffering from low power of test. For validation, real cases of personnel selection of two high-tech companies containing direct and indirect labors in Hsinchu, Taiwan are studied using the proposed hybrid data mining framework. Implementation results show that the proposed approach is effective and has a better performance than that of traditional SVM, RST and DT.
author2 Chen-Fu Chien
author_facet Chen-Fu Chien
Li-Fei Chen
陳麗妃
author Li-Fei Chen
陳麗妃
spellingShingle Li-Fei Chen
陳麗妃
A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies
author_sort Li-Fei Chen
title A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies
title_short A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies
title_full A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies
title_fullStr A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies
title_full_unstemmed A Hybrid Data Mining Framework with Rough Set Theory, Support Vector Machine, and Decision Tree and its Case Studies
title_sort hybrid data mining framework with rough set theory, support vector machine, and decision tree and its case studies
publishDate 2007
url http://ndltd.ncl.edu.tw/handle/30869955008789719497
work_keys_str_mv AT lifeichen ahybriddataminingframeworkwithroughsettheorysupportvectormachineanddecisiontreeanditscasestudies
AT chénlìfēi ahybriddataminingframeworkwithroughsettheorysupportvectormachineanddecisiontreeanditscasestudies
AT lifeichen zhěnghéyuēlüèjíhélùnzhīyuánxiàngliàngjīyǔjuécèshùzhīzīliàowākuàngjiàgòujíqígèànyánjiū
AT chénlìfēi zhěnghéyuēlüèjíhélùnzhīyuánxiàngliàngjīyǔjuécèshùzhīzīliàowākuàngjiàgòujíqígèànyánjiū
AT lifeichen hybriddataminingframeworkwithroughsettheorysupportvectormachineanddecisiontreeanditscasestudies
AT chénlìfēi hybriddataminingframeworkwithroughsettheorysupportvectormachineanddecisiontreeanditscasestudies
_version_ 1718279981523533824