Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest

A self-training algorithm is an iterative method for semi-supervised learning, which wraps around a base learner. It uses its own predictions to assign labels to unlabeled data. For a self-training algorithm, the classification ability of the base learner and the estimation of prediction confidence...

Full description

Bibliographic Details
Main Authors:	Zhenyu Liu, Tao Wen, Wei Sun, Qilong Zhang
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Semi-supervised learning self-training decision tree random forest node splits
Online Access:	https://ieeexplore.ieee.org/document/9139499/

id	doaj-2edc5885a86d400286c015cd6addbef5
record_format	Article
spelling	doaj-2edc5885a86d400286c015cd6addbef52021-03-30T04:40:06ZengIEEEIEEE Access2169-35362020-01-01812833712834810.1109/ACCESS.2020.30089519139499Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random ForestZhenyu Liu0https://orcid.org/0000-0002-9705-8377Tao Wen1Wei Sun2Qilong Zhang3College of Computer Science and Engineering, Northeastern University, Shenyang, ChinaCollege of Computer Science and Engineering, Northeastern University, Shenyang, ChinaDepartment of Computer Science and Technology, Dalian Neusoft University of Information, Dalian, ChinaCollege of Computer Science and Engineering, Northeastern University, Shenyang, ChinaA self-training algorithm is an iterative method for semi-supervised learning, which wraps around a base learner. It uses its own predictions to assign labels to unlabeled data. For a self-training algorithm, the classification ability of the base learner and the estimation of prediction confidence are very important. The classical decision tree as the base learner cannot be effective in a self-training algorithm, because it cannot correctly estimate its own predictions. In this paper, we propose a novel method of node split of the decision trees, which uses weighted features to cluster instances. This method is able to combine multiple numerical and categorical features to split nodes. The decision tree and random forest constructed by this method are called FWCDT and FWCRF respectively. FWCDT and FWCRF have the better classification ability than the classical decision trees and forests based on univariate split when the training instances are fewer, therefore, they are more suitable as the base classifiers in self-training. What's more, on the basis of the proposed node-split method, we also respectively explore the suitable prediction confidence measurements for FWCDT and FWCRF. Finally, the results of experiment implemented on the UCI datasets show that the self-training feature weighted clustering decision tree (ST-FWCDT) and random forest (ST-FWCRF) can effectively exploit unlabeled data, and the final obtained classifiers have better generalization ability.https://ieeexplore.ieee.org/document/9139499/Semi-supervised learningself-trainingdecision treerandom forestnode splits
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Zhenyu Liu Tao Wen Wei Sun Qilong Zhang
spellingShingle	Zhenyu Liu Tao Wen Wei Sun Qilong Zhang Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest IEEE Access Semi-supervised learning self-training decision tree random forest node splits
author_facet	Zhenyu Liu Tao Wen Wei Sun Qilong Zhang
author_sort	Zhenyu Liu
title	Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest
title_short	Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest
title_full	Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest
title_fullStr	Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest
title_full_unstemmed	Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest
title_sort	semi-supervised self-training feature weighted clustering decision tree and random forest
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	A self-training algorithm is an iterative method for semi-supervised learning, which wraps around a base learner. It uses its own predictions to assign labels to unlabeled data. For a self-training algorithm, the classification ability of the base learner and the estimation of prediction confidence are very important. The classical decision tree as the base learner cannot be effective in a self-training algorithm, because it cannot correctly estimate its own predictions. In this paper, we propose a novel method of node split of the decision trees, which uses weighted features to cluster instances. This method is able to combine multiple numerical and categorical features to split nodes. The decision tree and random forest constructed by this method are called FWCDT and FWCRF respectively. FWCDT and FWCRF have the better classification ability than the classical decision trees and forests based on univariate split when the training instances are fewer, therefore, they are more suitable as the base classifiers in self-training. What's more, on the basis of the proposed node-split method, we also respectively explore the suitable prediction confidence measurements for FWCDT and FWCRF. Finally, the results of experiment implemented on the UCI datasets show that the self-training feature weighted clustering decision tree (ST-FWCDT) and random forest (ST-FWCRF) can effectively exploit unlabeled data, and the final obtained classifiers have better generalization ability.
topic	Semi-supervised learning self-training decision tree random forest node splits
url	https://ieeexplore.ieee.org/document/9139499/
work_keys_str_mv	AT zhenyuliu semisupervisedselftrainingfeatureweightedclusteringdecisiontreeandrandomforest AT taowen semisupervisedselftrainingfeatureweightedclusteringdecisiontreeandrandomforest AT weisun semisupervisedselftrainingfeatureweightedclusteringdecisiontreeandrandomforest AT qilongzhang semisupervisedselftrainingfeatureweightedclusteringdecisiontreeandrandomforest
_version_	1724181412402167808

Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest

Similar Items