Summary: | 碩士 === 輔仁大學 === 資訊工程學系 === 98 === Abstract
Feature selection is an important method in machine learning and data mining. It reduces dimensionality of data and increases performance in classification and clustering. Zhang et al. (2006) have developed a feature selection algorithm, named as recursive support vector machine (R-SVM), to select important biomarkers for biological data. R-SVM is based on the technology of support vector machine. However, it only works for linear kernels. To overcome this limitation of R-SVM, we propose a distance-based cumulative distribution function (DCDF) algorithm that works for linear and nonlinear kernels. In this study, DCDF is also implemented to compare with R-SVM. The experiments include eight different types of cancer data and four UCI datasets. The results show that DCDF outperforms R-SVM using either linear or nonlinear kernels. In some datasets, the DCDF method using nonlinear kernels achieve much better results and significantly outperform R-SVM.
Keywords:Feature selection, Support vector machine, Cumulative distribution function, Recursive-SVM
|