MDL-based Model Trees for Classification of Hybrid Type Data
碩士 === 國立臺灣科技大學 === 資訊工程系 === 95 === We propose a method of model selection for the dataset of hybrid types, that is, the dataset includes both of nominal and numeric data attributes. Motivated by the effectiveness of decision tree on nominal data and the success of support vector machine on numeric...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | en_US |
Published: |
2007
|
Online Access: | http://ndltd.ncl.edu.tw/handle/9x7xgp |
id |
ndltd-TW-095NTUS5392074 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-095NTUS53920742019-05-15T19:47:45Z http://ndltd.ncl.edu.tw/handle/9x7xgp MDL-based Model Trees for Classification of Hybrid Type Data MDL-basedModelTreesforClassificationofHybridTypeData Hsin-Chih Chung 鍾興志 碩士 國立臺灣科技大學 資訊工程系 95 We propose a method of model selection for the dataset of hybrid types, that is, the dataset includes both of nominal and numeric data attributes. Motivated by the effectiveness of decision tree on nominal data and the success of support vector machine on numeric data, we propose a model tree combining both models. We derive a synthesized Boolean attribute based on the classification from SVM applying only on those numeric attributes. After that, the SVM-synthesized attribute as well as all of the nominal attributes are collected for the decision tree induction, or specifically the ID3 algorithm which selects the "best" attribute based on some goodness criteria. The concept of model tree is not new. Different from the model tree proposed by Chang et al. in 2004, we aim at improving the performance by a Minimum Description Length approach. The MDL principle is adopted to balance the choice between the SVM-synthesized attribute and a discrete attribute by also considering their model complexity. That is, an SVM is considered a more complex model than a simple discrete classifier (such as "education = Master or Ph.D."). Therefore, a large penalty should be paid to an SVM classifier rather than a discrete classifier in the selection of best attribute in decision tree induction. The penalty gives in a form where its 1-D case coincides the one proposed by Quinlan in 1996 for a simple numeric classifier (such as "age>=22"). Our experiments show that the modification improves the prediction accuracy in many datasets from the real world. Hsing-Kuo Pao 鮑興國 2007 學位論文 ; thesis 55 en_US |
collection |
NDLTD |
language |
en_US |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣科技大學 === 資訊工程系 === 95 === We propose a method of model selection for the dataset of hybrid types, that is, the dataset includes both of nominal and numeric data attributes. Motivated by the effectiveness of decision tree on nominal data and the success of support vector machine on numeric data, we propose a model tree combining both models. We derive a synthesized Boolean attribute based on the classification from SVM applying only on those numeric attributes. After that, the SVM-synthesized attribute as well as all of the nominal attributes are collected for the decision tree induction, or specifically the ID3 algorithm which selects the "best" attribute based on some goodness criteria. The concept of model tree is not new. Different from the model tree proposed by Chang et al. in 2004, we aim at improving the performance by a Minimum Description Length approach. The MDL principle is adopted to balance the choice between the SVM-synthesized attribute and a discrete attribute by also considering their model complexity. That is, an SVM is considered a more complex model than a simple discrete classifier (such as "education = Master or Ph.D."). Therefore, a large penalty should be paid to an SVM classifier rather than a discrete classifier in the selection of best attribute in decision tree induction. The penalty gives in a form where its 1-D case coincides the one proposed by Quinlan in 1996 for a simple numeric classifier (such as "age>=22"). Our experiments show that the modification improves the prediction accuracy in many datasets from the real world.
|
author2 |
Hsing-Kuo Pao |
author_facet |
Hsing-Kuo Pao Hsin-Chih Chung 鍾興志 |
author |
Hsin-Chih Chung 鍾興志 |
spellingShingle |
Hsin-Chih Chung 鍾興志 MDL-based Model Trees for Classification of Hybrid Type Data |
author_sort |
Hsin-Chih Chung |
title |
MDL-based Model Trees for Classification of Hybrid Type Data |
title_short |
MDL-based Model Trees for Classification of Hybrid Type Data |
title_full |
MDL-based Model Trees for Classification of Hybrid Type Data |
title_fullStr |
MDL-based Model Trees for Classification of Hybrid Type Data |
title_full_unstemmed |
MDL-based Model Trees for Classification of Hybrid Type Data |
title_sort |
mdl-based model trees for classification of hybrid type data |
publishDate |
2007 |
url |
http://ndltd.ncl.edu.tw/handle/9x7xgp |
work_keys_str_mv |
AT hsinchihchung mdlbasedmodeltreesforclassificationofhybridtypedata AT zhōngxìngzhì mdlbasedmodeltreesforclassificationofhybridtypedata |
_version_ |
1719093930286383104 |