Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest
A self-training algorithm is an iterative method for semi-supervised learning, which wraps around a base learner. It uses its own predictions to assign labels to unlabeled data. For a self-training algorithm, the classification ability of the base learner and the estimation of prediction confidence...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
IEEE
2020-01-01
|
Series: | IEEE Access |
Subjects: | |
Online Access: | https://ieeexplore.ieee.org/document/9139499/ |
id |
doaj-2edc5885a86d400286c015cd6addbef5 |
---|---|
record_format |
Article |
spelling |
doaj-2edc5885a86d400286c015cd6addbef52021-03-30T04:40:06ZengIEEEIEEE Access2169-35362020-01-01812833712834810.1109/ACCESS.2020.30089519139499Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random ForestZhenyu Liu0https://orcid.org/0000-0002-9705-8377Tao Wen1Wei Sun2Qilong Zhang3College of Computer Science and Engineering, Northeastern University, Shenyang, ChinaCollege of Computer Science and Engineering, Northeastern University, Shenyang, ChinaDepartment of Computer Science and Technology, Dalian Neusoft University of Information, Dalian, ChinaCollege of Computer Science and Engineering, Northeastern University, Shenyang, ChinaA self-training algorithm is an iterative method for semi-supervised learning, which wraps around a base learner. It uses its own predictions to assign labels to unlabeled data. For a self-training algorithm, the classification ability of the base learner and the estimation of prediction confidence are very important. The classical decision tree as the base learner cannot be effective in a self-training algorithm, because it cannot correctly estimate its own predictions. In this paper, we propose a novel method of node split of the decision trees, which uses weighted features to cluster instances. This method is able to combine multiple numerical and categorical features to split nodes. The decision tree and random forest constructed by this method are called FWCDT and FWCRF respectively. FWCDT and FWCRF have the better classification ability than the classical decision trees and forests based on univariate split when the training instances are fewer, therefore, they are more suitable as the base classifiers in self-training. What's more, on the basis of the proposed node-split method, we also respectively explore the suitable prediction confidence measurements for FWCDT and FWCRF. Finally, the results of experiment implemented on the UCI datasets show that the self-training feature weighted clustering decision tree (ST-FWCDT) and random forest (ST-FWCRF) can effectively exploit unlabeled data, and the final obtained classifiers have better generalization ability.https://ieeexplore.ieee.org/document/9139499/Semi-supervised learningself-trainingdecision treerandom forestnode splits |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Zhenyu Liu Tao Wen Wei Sun Qilong Zhang |
spellingShingle |
Zhenyu Liu Tao Wen Wei Sun Qilong Zhang Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest IEEE Access Semi-supervised learning self-training decision tree random forest node splits |
author_facet |
Zhenyu Liu Tao Wen Wei Sun Qilong Zhang |
author_sort |
Zhenyu Liu |
title |
Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest |
title_short |
Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest |
title_full |
Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest |
title_fullStr |
Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest |
title_full_unstemmed |
Semi-Supervised Self-Training Feature Weighted Clustering Decision Tree and Random Forest |
title_sort |
semi-supervised self-training feature weighted clustering decision tree and random forest |
publisher |
IEEE |
series |
IEEE Access |
issn |
2169-3536 |
publishDate |
2020-01-01 |
description |
A self-training algorithm is an iterative method for semi-supervised learning, which wraps around a base learner. It uses its own predictions to assign labels to unlabeled data. For a self-training algorithm, the classification ability of the base learner and the estimation of prediction confidence are very important. The classical decision tree as the base learner cannot be effective in a self-training algorithm, because it cannot correctly estimate its own predictions. In this paper, we propose a novel method of node split of the decision trees, which uses weighted features to cluster instances. This method is able to combine multiple numerical and categorical features to split nodes. The decision tree and random forest constructed by this method are called FWCDT and FWCRF respectively. FWCDT and FWCRF have the better classification ability than the classical decision trees and forests based on univariate split when the training instances are fewer, therefore, they are more suitable as the base classifiers in self-training. What's more, on the basis of the proposed node-split method, we also respectively explore the suitable prediction confidence measurements for FWCDT and FWCRF. Finally, the results of experiment implemented on the UCI datasets show that the self-training feature weighted clustering decision tree (ST-FWCDT) and random forest (ST-FWCRF) can effectively exploit unlabeled data, and the final obtained classifiers have better generalization ability. |
topic |
Semi-supervised learning self-training decision tree random forest node splits |
url |
https://ieeexplore.ieee.org/document/9139499/ |
work_keys_str_mv |
AT zhenyuliu semisupervisedselftrainingfeatureweightedclusteringdecisiontreeandrandomforest AT taowen semisupervisedselftrainingfeatureweightedclusteringdecisiontreeandrandomforest AT weisun semisupervisedselftrainingfeatureweightedclusteringdecisiontreeandrandomforest AT qilongzhang semisupervisedselftrainingfeatureweightedclusteringdecisiontreeandrandomforest |
_version_ |
1724181412402167808 |