Improving Selected split by Stratified Bootstrap Methods

碩士 === 國立臺灣大學 === 工業工程學研究所 === 100 === Generally, it’s believed that the traditional classification tree, such as Classification and Regression Trees (CART), can effectively classify certain type of data distribution clearly. In fact, because of the split selecting criterion and the procedure used b...

Full description

Bibliographic Details
Main Authors: Po-Hsun Wang, 王柏勛
Other Authors: Argon Chen
Format: Others
Language:zh-TW
Published: 2012
Online Access:http://ndltd.ncl.edu.tw/handle/76399410078005300947
id ndltd-TW-100NTU05030046
record_format oai_dc
spelling ndltd-TW-100NTU050300462015-10-13T21:50:17Z http://ndltd.ncl.edu.tw/handle/76399410078005300947 Improving Selected split by Stratified Bootstrap Methods 以分層拔靴抽樣法改善分類樹之切割能力 Po-Hsun Wang 王柏勛 碩士 國立臺灣大學 工業工程學研究所 100 Generally, it’s believed that the traditional classification tree, such as Classification and Regression Trees (CART), can effectively classify certain type of data distribution clearly. In fact, because of the split selecting criterion and the procedure used by the traditional classification tree, we can show that it is not always as efficient as expected. The unsuitable split selected will result in many problems such as sample size depletion and over fitting. Without enough sample size, split in the lower hierarchical levels becomes incorrect selection of attributes extremely unreliable. In order to improve the CART performance, we use the Variation Reduction criterion to select the split of a node that splits a node into two child nodes in the next layer. In this research, we propose a new method to improve the split selection. We use stratified sampling to stratify data into multiple sub-sample and use bootstrap method to re-sampling incidences in each sub-sample. The splits are then selected by the variation reduction criterion. Finally, we calculate the mean of each split of bootstrap sample as the “stratified bootstrap split” . The stratified bootstrap splits can improve the variability of splits for certain types of sample distribution and obtain a more stable split to avoid incorrect splits and attribute selection. According to the simulation results in this research, the densities of sample distribution is the most important factor that affects the “Original split” and “Stratified Bootstrap split” performance. We propose a “Weighted split” to integrate the original CART split and the proposed “Stratified Bootstrap split”. It is shown that the weighted split is robust and thus avoid incorrect split and selection of attributes. Though out this thesis, examples are use to illustrate the proposed method. Finally, a hypothetic tree is used to demonstrate how the performance of CART can be improved by the proposed weighted split. Argon Chen 陳正剛 2012 學位論文 ; thesis 68 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 工業工程學研究所 === 100 === Generally, it’s believed that the traditional classification tree, such as Classification and Regression Trees (CART), can effectively classify certain type of data distribution clearly. In fact, because of the split selecting criterion and the procedure used by the traditional classification tree, we can show that it is not always as efficient as expected. The unsuitable split selected will result in many problems such as sample size depletion and over fitting. Without enough sample size, split in the lower hierarchical levels becomes incorrect selection of attributes extremely unreliable. In order to improve the CART performance, we use the Variation Reduction criterion to select the split of a node that splits a node into two child nodes in the next layer. In this research, we propose a new method to improve the split selection. We use stratified sampling to stratify data into multiple sub-sample and use bootstrap method to re-sampling incidences in each sub-sample. The splits are then selected by the variation reduction criterion. Finally, we calculate the mean of each split of bootstrap sample as the “stratified bootstrap split” . The stratified bootstrap splits can improve the variability of splits for certain types of sample distribution and obtain a more stable split to avoid incorrect splits and attribute selection. According to the simulation results in this research, the densities of sample distribution is the most important factor that affects the “Original split” and “Stratified Bootstrap split” performance. We propose a “Weighted split” to integrate the original CART split and the proposed “Stratified Bootstrap split”. It is shown that the weighted split is robust and thus avoid incorrect split and selection of attributes. Though out this thesis, examples are use to illustrate the proposed method. Finally, a hypothetic tree is used to demonstrate how the performance of CART can be improved by the proposed weighted split.
author2 Argon Chen
author_facet Argon Chen
Po-Hsun Wang
王柏勛
author Po-Hsun Wang
王柏勛
spellingShingle Po-Hsun Wang
王柏勛
Improving Selected split by Stratified Bootstrap Methods
author_sort Po-Hsun Wang
title Improving Selected split by Stratified Bootstrap Methods
title_short Improving Selected split by Stratified Bootstrap Methods
title_full Improving Selected split by Stratified Bootstrap Methods
title_fullStr Improving Selected split by Stratified Bootstrap Methods
title_full_unstemmed Improving Selected split by Stratified Bootstrap Methods
title_sort improving selected split by stratified bootstrap methods
publishDate 2012
url http://ndltd.ncl.edu.tw/handle/76399410078005300947
work_keys_str_mv AT pohsunwang improvingselectedsplitbystratifiedbootstrapmethods
AT wángbǎixūn improvingselectedsplitbystratifiedbootstrapmethods
AT pohsunwang yǐfēncéngbáxuēchōuyàngfǎgǎishànfēnlèishùzhīqiègēnénglì
AT wángbǎixūn yǐfēncéngbáxuēchōuyàngfǎgǎishànfēnlèishùzhīqiègēnénglì
_version_ 1718068607747883008