MDL-based Model Trees for Classification of Hybrid Type Data

碩士 === 國立臺灣科技大學 === 資訊工程系 === 95 === We propose a method of model selection for the dataset of hybrid types, that is, the dataset includes both of nominal and numeric data attributes. Motivated by the effectiveness of decision tree on nominal data and the success of support vector machine on numeric...

Full description

Bibliographic Details
Main Authors: Hsin-Chih Chung, 鍾興志
Other Authors: Hsing-Kuo Pao
Format: Others
Language:en_US
Published: 2007
Online Access:http://ndltd.ncl.edu.tw/handle/9x7xgp
id ndltd-TW-095NTUS5392074
record_format oai_dc
spelling ndltd-TW-095NTUS53920742019-05-15T19:47:45Z http://ndltd.ncl.edu.tw/handle/9x7xgp MDL-based Model Trees for Classification of Hybrid Type Data MDL-basedModelTreesforClassificationofHybridTypeData Hsin-Chih Chung 鍾興志 碩士 國立臺灣科技大學 資訊工程系 95 We propose a method of model selection for the dataset of hybrid types, that is, the dataset includes both of nominal and numeric data attributes. Motivated by the effectiveness of decision tree on nominal data and the success of support vector machine on numeric data, we propose a model tree combining both models. We derive a synthesized Boolean attribute based on the classification from SVM applying only on those numeric attributes. After that, the SVM-synthesized attribute as well as all of the nominal attributes are collected for the decision tree induction, or specifically the ID3 algorithm which selects the "best" attribute based on some goodness criteria. The concept of model tree is not new. Different from the model tree proposed by Chang et al. in 2004, we aim at improving the performance by a Minimum Description Length approach. The MDL principle is adopted to balance the choice between the SVM-synthesized attribute and a discrete attribute by also considering their model complexity. That is, an SVM is considered a more complex model than a simple discrete classifier (such as "education = Master or Ph.D."). Therefore, a large penalty should be paid to an SVM classifier rather than a discrete classifier in the selection of best attribute in decision tree induction. The penalty gives in a form where its 1-D case coincides the one proposed by Quinlan in 1996 for a simple numeric classifier (such as "age>=22"). Our experiments show that the modification improves the prediction accuracy in many datasets from the real world. Hsing-Kuo Pao 鮑興國 2007 學位論文 ; thesis 55 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣科技大學 === 資訊工程系 === 95 === We propose a method of model selection for the dataset of hybrid types, that is, the dataset includes both of nominal and numeric data attributes. Motivated by the effectiveness of decision tree on nominal data and the success of support vector machine on numeric data, we propose a model tree combining both models. We derive a synthesized Boolean attribute based on the classification from SVM applying only on those numeric attributes. After that, the SVM-synthesized attribute as well as all of the nominal attributes are collected for the decision tree induction, or specifically the ID3 algorithm which selects the "best" attribute based on some goodness criteria. The concept of model tree is not new. Different from the model tree proposed by Chang et al. in 2004, we aim at improving the performance by a Minimum Description Length approach. The MDL principle is adopted to balance the choice between the SVM-synthesized attribute and a discrete attribute by also considering their model complexity. That is, an SVM is considered a more complex model than a simple discrete classifier (such as "education = Master or Ph.D."). Therefore, a large penalty should be paid to an SVM classifier rather than a discrete classifier in the selection of best attribute in decision tree induction. The penalty gives in a form where its 1-D case coincides the one proposed by Quinlan in 1996 for a simple numeric classifier (such as "age>=22"). Our experiments show that the modification improves the prediction accuracy in many datasets from the real world.
author2 Hsing-Kuo Pao
author_facet Hsing-Kuo Pao
Hsin-Chih Chung
鍾興志
author Hsin-Chih Chung
鍾興志
spellingShingle Hsin-Chih Chung
鍾興志
MDL-based Model Trees for Classification of Hybrid Type Data
author_sort Hsin-Chih Chung
title MDL-based Model Trees for Classification of Hybrid Type Data
title_short MDL-based Model Trees for Classification of Hybrid Type Data
title_full MDL-based Model Trees for Classification of Hybrid Type Data
title_fullStr MDL-based Model Trees for Classification of Hybrid Type Data
title_full_unstemmed MDL-based Model Trees for Classification of Hybrid Type Data
title_sort mdl-based model trees for classification of hybrid type data
publishDate 2007
url http://ndltd.ncl.edu.tw/handle/9x7xgp
work_keys_str_mv AT hsinchihchung mdlbasedmodeltreesforclassificationofhybridtypedata
AT zhōngxìngzhì mdlbasedmodeltreesforclassificationofhybridtypedata
_version_ 1719093930286383104