Improving Selected split by Stratified Bootstrap Methods
碩士 === 國立臺灣大學 === 工業工程學研究所 === 100 === Generally, it’s believed that the traditional classification tree, such as Classification and Regression Trees (CART), can effectively classify certain type of data distribution clearly. In fact, because of the split selecting criterion and the procedure used b...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2012
|
Online Access: | http://ndltd.ncl.edu.tw/handle/76399410078005300947 |
id |
ndltd-TW-100NTU05030046 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-100NTU050300462015-10-13T21:50:17Z http://ndltd.ncl.edu.tw/handle/76399410078005300947 Improving Selected split by Stratified Bootstrap Methods 以分層拔靴抽樣法改善分類樹之切割能力 Po-Hsun Wang 王柏勛 碩士 國立臺灣大學 工業工程學研究所 100 Generally, it’s believed that the traditional classification tree, such as Classification and Regression Trees (CART), can effectively classify certain type of data distribution clearly. In fact, because of the split selecting criterion and the procedure used by the traditional classification tree, we can show that it is not always as efficient as expected. The unsuitable split selected will result in many problems such as sample size depletion and over fitting. Without enough sample size, split in the lower hierarchical levels becomes incorrect selection of attributes extremely unreliable. In order to improve the CART performance, we use the Variation Reduction criterion to select the split of a node that splits a node into two child nodes in the next layer. In this research, we propose a new method to improve the split selection. We use stratified sampling to stratify data into multiple sub-sample and use bootstrap method to re-sampling incidences in each sub-sample. The splits are then selected by the variation reduction criterion. Finally, we calculate the mean of each split of bootstrap sample as the “stratified bootstrap split” . The stratified bootstrap splits can improve the variability of splits for certain types of sample distribution and obtain a more stable split to avoid incorrect splits and attribute selection. According to the simulation results in this research, the densities of sample distribution is the most important factor that affects the “Original split” and “Stratified Bootstrap split” performance. We propose a “Weighted split” to integrate the original CART split and the proposed “Stratified Bootstrap split”. It is shown that the weighted split is robust and thus avoid incorrect split and selection of attributes. Though out this thesis, examples are use to illustrate the proposed method. Finally, a hypothetic tree is used to demonstrate how the performance of CART can be improved by the proposed weighted split. Argon Chen 陳正剛 2012 學位論文 ; thesis 68 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣大學 === 工業工程學研究所 === 100 === Generally, it’s believed that the traditional classification tree, such as Classification and Regression Trees (CART), can effectively classify certain type of data distribution clearly. In fact, because of the split selecting criterion and the procedure used by the traditional classification tree, we can show that it is not always as efficient as expected. The unsuitable split selected will result in many problems such as sample size depletion and over fitting. Without enough sample size, split in the lower hierarchical levels becomes incorrect selection of attributes extremely unreliable.
In order to improve the CART performance, we use the Variation Reduction criterion to select the split of a node that splits a node into two child nodes in the next layer. In this research, we propose a new method to improve the split selection. We use stratified sampling to stratify data into multiple sub-sample and use bootstrap method to re-sampling incidences in each sub-sample. The splits are then selected by the variation reduction criterion. Finally, we calculate the mean of each split of bootstrap sample as the “stratified bootstrap split” . The stratified bootstrap splits can improve the variability of splits for certain types of sample distribution and obtain a more stable split to avoid incorrect splits and attribute selection.
According to the simulation results in this research, the densities of sample distribution is the most important factor that affects the “Original split” and “Stratified Bootstrap split” performance. We propose a “Weighted split” to integrate the original CART split and the proposed “Stratified Bootstrap split”. It is shown that the weighted split is robust and thus avoid incorrect split and selection of attributes. Though out this thesis, examples are use to illustrate the proposed method. Finally, a hypothetic tree is used to demonstrate how the performance of CART can be improved by the proposed weighted split.
|
author2 |
Argon Chen |
author_facet |
Argon Chen Po-Hsun Wang 王柏勛 |
author |
Po-Hsun Wang 王柏勛 |
spellingShingle |
Po-Hsun Wang 王柏勛 Improving Selected split by Stratified Bootstrap Methods |
author_sort |
Po-Hsun Wang |
title |
Improving Selected split by Stratified Bootstrap Methods |
title_short |
Improving Selected split by Stratified Bootstrap Methods |
title_full |
Improving Selected split by Stratified Bootstrap Methods |
title_fullStr |
Improving Selected split by Stratified Bootstrap Methods |
title_full_unstemmed |
Improving Selected split by Stratified Bootstrap Methods |
title_sort |
improving selected split by stratified bootstrap methods |
publishDate |
2012 |
url |
http://ndltd.ncl.edu.tw/handle/76399410078005300947 |
work_keys_str_mv |
AT pohsunwang improvingselectedsplitbystratifiedbootstrapmethods AT wángbǎixūn improvingselectedsplitbystratifiedbootstrapmethods AT pohsunwang yǐfēncéngbáxuēchōuyàngfǎgǎishànfēnlèishùzhīqiègēnénglì AT wángbǎixūn yǐfēncéngbáxuēchōuyàngfǎgǎishànfēnlèishùzhīqiègēnénglì |
_version_ |
1718068607747883008 |