Feature Selection Method Based on Support Vector Machine and Cumulative Distribution Function

碩士 === 輔仁大學 === 資訊工程學系 === 98 === Abstract Feature selection is an important method in machine learning and data mining. It reduces dimensionality of data and increases performance in classification and clustering. Zhang et al. (2006) have developed a feature selection algorithm, named as recursive...

Full description

Bibliographic Details
Main Authors: Chih-En Liu, 劉志恩
Other Authors: Jen-Ing G. Hwang
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/06692637254680399272
Description
Summary:碩士 === 輔仁大學 === 資訊工程學系 === 98 === Abstract Feature selection is an important method in machine learning and data mining. It reduces dimensionality of data and increases performance in classification and clustering. Zhang et al. (2006) have developed a feature selection algorithm, named as recursive support vector machine (R-SVM), to select important biomarkers for biological data. R-SVM is based on the technology of support vector machine. However, it only works for linear kernels. To overcome this limitation of R-SVM, we propose a distance-based cumulative distribution function (DCDF) algorithm that works for linear and nonlinear kernels. In this study, DCDF is also implemented to compare with R-SVM. The experiments include eight different types of cancer data and four UCI datasets. The results show that DCDF outperforms R-SVM using either linear or nonlinear kernels. In some datasets, the DCDF method using nonlinear kernels achieve much better results and significantly outperform R-SVM. Keywords:Feature selection, Support vector machine, Cumulative distribution function, Recursive-SVM