A Unified Framework for Decision Tree on Continuous Attributes
The standard algorithms of decision trees and their derived methods are usually constructed on the basis of the frequency information. However, they still suffer from a dilemma or multichotomous question for continuous attributes when two or more candidate cut points have the same or similar splitti...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2019-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/8610001/ |
id |
doaj-b2455dfebe7048a292f14d51a58bc61d |
---|---|
record_format |
Article |
spelling |
doaj-b2455dfebe7048a292f14d51a58bc61d2021-03-29T22:02:58ZengIEEEIEEE Access2169-35362019-01-017119241193310.1109/ACCESS.2019.28920838610001A Unified Framework for Decision Tree on Continuous AttributesJianjian Yan0Zhongnan Zhang1https://orcid.org/0000-0002-7227-3943Lingwei Xie2Zhantu Zhu3Software School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaThe standard algorithms of decision trees and their derived methods are usually constructed on the basis of the frequency information. However, they still suffer from a dilemma or multichotomous question for continuous attributes when two or more candidate cut points have the same or similar splitting performance with the optimal value, such as the maximal information gain ratio or the minimal Gini index. In this paper, we propose a unified framework model to deal with this question. We then design two algorithms based on Splitting Performance and the number of Expected Segments, called SPES1 and SPES2, which determine the optimal cut point, as follows. First, several candidate cut points are selected based on their splitting performances being the closest to the optimal. Second, we compute the number of expected segments for each candidate cut point. Finally, we combine these two measures by introducing a weighting factor $\alpha $ to determine the optimal one from several candidate cut points. To validate the effectiveness of our methods, we perform them on 25 benchmark datasets. The experimental results demonstrate that the classification accuracies of the proposed algorithms are superior to the current state-of-the-art methods in tackling the multichotomous question, about 5% in some cases. In particular, according to the proposed methods, the number of candidate cut points converges to a certain extent.https://ieeexplore.ieee.org/document/8610001/Decision treeclassificationunified frameworksplit criteria |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jianjian Yan Zhongnan Zhang Lingwei Xie Zhantu Zhu |
spellingShingle |
Jianjian Yan Zhongnan Zhang Lingwei Xie Zhantu Zhu A Unified Framework for Decision Tree on Continuous Attributes IEEE Access Decision tree classification unified framework split criteria |
author_facet |
Jianjian Yan Zhongnan Zhang Lingwei Xie Zhantu Zhu |
author_sort |
Jianjian Yan |
title |
A Unified Framework for Decision Tree on Continuous Attributes |
title_short |
A Unified Framework for Decision Tree on Continuous Attributes |
title_full |
A Unified Framework for Decision Tree on Continuous Attributes |
title_fullStr |
A Unified Framework for Decision Tree on Continuous Attributes |
title_full_unstemmed |
A Unified Framework for Decision Tree on Continuous Attributes |
title_sort |
unified framework for decision tree on continuous attributes |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2019-01-01 |
description |
The standard algorithms of decision trees and their derived methods are usually constructed on the basis of the frequency information. However, they still suffer from a dilemma or multichotomous question for continuous attributes when two or more candidate cut points have the same or similar splitting performance with the optimal value, such as the maximal information gain ratio or the minimal Gini index. In this paper, we propose a unified framework model to deal with this question. We then design two algorithms based on Splitting Performance and the number of Expected Segments, called SPES1 and SPES2, which determine the optimal cut point, as follows. First, several candidate cut points are selected based on their splitting performances being the closest to the optimal. Second, we compute the number of expected segments for each candidate cut point. Finally, we combine these two measures by introducing a weighting factor $\alpha $ to determine the optimal one from several candidate cut points. To validate the effectiveness of our methods, we perform them on 25 benchmark datasets. The experimental results demonstrate that the classification accuracies of the proposed algorithms are superior to the current state-of-the-art methods in tackling the multichotomous question, about 5% in some cases. In particular, according to the proposed methods, the number of candidate cut points converges to a certain extent. |
topic |
Decision tree classification unified framework split criteria |
url |
https://ieeexplore.ieee.org/document/8610001/ |
work_keys_str_mv |
AT jianjianyan aunifiedframeworkfordecisiontreeoncontinuousattributes AT zhongnanzhang aunifiedframeworkfordecisiontreeoncontinuousattributes AT lingweixie aunifiedframeworkfordecisiontreeoncontinuousattributes AT zhantuzhu aunifiedframeworkfordecisiontreeoncontinuousattributes AT jianjianyan unifiedframeworkfordecisiontreeoncontinuousattributes AT zhongnanzhang unifiedframeworkfordecisiontreeoncontinuousattributes AT lingweixie unifiedframeworkfordecisiontreeoncontinuousattributes AT zhantuzhu unifiedframeworkfordecisiontreeoncontinuousattributes |
_version_ |
1724192305685987328 |