A Unified Framework for Decision Tree on Continuous Attributes

The standard algorithms of decision trees and their derived methods are usually constructed on the basis of the frequency information. However, they still suffer from a dilemma or multichotomous question for continuous attributes when two or more candidate cut points have the same or similar splitti...

Full description

Bibliographic Details
Main Authors: Jianjian Yan, Zhongnan Zhang, Lingwei Xie, Zhantu Zhu
Format: Article
Language:English
Published: IEEE 2019-01-01
Series:IEEE Access
Subjects:
Online Access:https://ieeexplore.ieee.org/document/8610001/
id doaj-b2455dfebe7048a292f14d51a58bc61d
record_format Article
spelling doaj-b2455dfebe7048a292f14d51a58bc61d2021-03-29T22:02:58ZengIEEEIEEE Access2169-35362019-01-017119241193310.1109/ACCESS.2019.28920838610001A Unified Framework for Decision Tree on Continuous AttributesJianjian Yan0Zhongnan Zhang1https://orcid.org/0000-0002-7227-3943Lingwei Xie2Zhantu Zhu3Software School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaSoftware School, Xiamen University, Xiamen, ChinaThe standard algorithms of decision trees and their derived methods are usually constructed on the basis of the frequency information. However, they still suffer from a dilemma or multichotomous question for continuous attributes when two or more candidate cut points have the same or similar splitting performance with the optimal value, such as the maximal information gain ratio or the minimal Gini index. In this paper, we propose a unified framework model to deal with this question. We then design two algorithms based on Splitting Performance and the number of Expected Segments, called SPES1 and SPES2, which determine the optimal cut point, as follows. First, several candidate cut points are selected based on their splitting performances being the closest to the optimal. Second, we compute the number of expected segments for each candidate cut point. Finally, we combine these two measures by introducing a weighting factor $\alpha $ to determine the optimal one from several candidate cut points. To validate the effectiveness of our methods, we perform them on 25 benchmark datasets. The experimental results demonstrate that the classification accuracies of the proposed algorithms are superior to the current state-of-the-art methods in tackling the multichotomous question, about 5% in some cases. In particular, according to the proposed methods, the number of candidate cut points converges to a certain extent.https://ieeexplore.ieee.org/document/8610001/Decision treeclassificationunified frameworksplit criteria
collection DOAJ
language English
format Article
sources DOAJ
author Jianjian Yan
Zhongnan Zhang
Lingwei Xie
Zhantu Zhu
spellingShingle Jianjian Yan
Zhongnan Zhang
Lingwei Xie
Zhantu Zhu
A Unified Framework for Decision Tree on Continuous Attributes
IEEE Access
Decision tree
classification
unified framework
split criteria
author_facet Jianjian Yan
Zhongnan Zhang
Lingwei Xie
Zhantu Zhu
author_sort Jianjian Yan
title A Unified Framework for Decision Tree on Continuous Attributes
title_short A Unified Framework for Decision Tree on Continuous Attributes
title_full A Unified Framework for Decision Tree on Continuous Attributes
title_fullStr A Unified Framework for Decision Tree on Continuous Attributes
title_full_unstemmed A Unified Framework for Decision Tree on Continuous Attributes
title_sort unified framework for decision tree on continuous attributes
publisher IEEE
series IEEE Access
issn 2169-3536
publishDate 2019-01-01
description The standard algorithms of decision trees and their derived methods are usually constructed on the basis of the frequency information. However, they still suffer from a dilemma or multichotomous question for continuous attributes when two or more candidate cut points have the same or similar splitting performance with the optimal value, such as the maximal information gain ratio or the minimal Gini index. In this paper, we propose a unified framework model to deal with this question. We then design two algorithms based on Splitting Performance and the number of Expected Segments, called SPES1 and SPES2, which determine the optimal cut point, as follows. First, several candidate cut points are selected based on their splitting performances being the closest to the optimal. Second, we compute the number of expected segments for each candidate cut point. Finally, we combine these two measures by introducing a weighting factor $\alpha $ to determine the optimal one from several candidate cut points. To validate the effectiveness of our methods, we perform them on 25 benchmark datasets. The experimental results demonstrate that the classification accuracies of the proposed algorithms are superior to the current state-of-the-art methods in tackling the multichotomous question, about 5% in some cases. In particular, according to the proposed methods, the number of candidate cut points converges to a certain extent.
topic Decision tree
classification
unified framework
split criteria
url https://ieeexplore.ieee.org/document/8610001/
work_keys_str_mv AT jianjianyan aunifiedframeworkfordecisiontreeoncontinuousattributes
AT zhongnanzhang aunifiedframeworkfordecisiontreeoncontinuousattributes
AT lingweixie aunifiedframeworkfordecisiontreeoncontinuousattributes
AT zhantuzhu aunifiedframeworkfordecisiontreeoncontinuousattributes
AT jianjianyan unifiedframeworkfordecisiontreeoncontinuousattributes
AT zhongnanzhang unifiedframeworkfordecisiontreeoncontinuousattributes
AT lingweixie unifiedframeworkfordecisiontreeoncontinuousattributes
AT zhantuzhu unifiedframeworkfordecisiontreeoncontinuousattributes
_version_ 1724192305685987328