Summary: | Classical random forest (RF) is suitable for the classification and regression tasks of high-dimensional data. However, the performance of RF may be not satisfied in case of few features, because univariate split method cannot bring more diverse individuals. In this paper, a novel method of node split of the decision trees is proposed, which adopts feature-weighting and clustering. This method can combine multiple numerical features, multiple categorical features or multiple mixed features. Based on the framework of RF, we use this split method to construct decision trees. The ensemble of the decision trees is called Feature-Weighting and Clustering Random Forest (FWCRF). The experiments show that FWCRF can get the better ensemble accuracy compared with the classical RF based on univariate decision tree on low-dimensional data, because FWCRF has better individual accuracy and lower similarity between individuals. Meanwhile, the empirical performance of FWCRF is not inferior to the classical RF and AdaBoost on high-dimensional data. Furthermore, compared with other multivariate RFs, the advantage of FWCRF is that it can directly deal with the categorical features, instead of the conversion from the categorical features to the numerical features.
|