Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm

Too many input features in applications may lead to over-fitting and reduce the performance of the learning algorithm. Moreover, in most cases, each feature containing different information content has different effects on the prediction target. Therefore, a feature selection method for calculating...

Full description

Bibliographic Details
Main Authors:	Shuangjie Li, Kaixiang Zhang, Qianru Chen, Shuqin Wang, Shaoqiang Zhang
Format:	Article
Language:	English
Published:	IEEE 2020-01-01
Series:	IEEE Access
Subjects:	Feature selection weighted K-nearest neighbors genetic algorithm real coding
Online Access:	https://ieeexplore.ieee.org/document/9151875/

id	doaj-22f37717c0b343b186197a03203f0271
record_format	Article
spelling	doaj-22f37717c0b343b186197a03203f02712021-03-30T04:37:53ZengIEEEIEEE Access2169-35362020-01-01813951213952810.1109/ACCESS.2020.30127689151875Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic AlgorithmShuangjie Li0https://orcid.org/0000-0001-6667-8330Kaixiang Zhang1Qianru Chen2Shuqin Wang3Shaoqiang Zhang4https://orcid.org/0000-0002-4127-0539College of Computer and Information Engineering, Tianjin Normal University, Tianjin, ChinaCollege of Computer and Information Engineering, Tianjin Normal University, Tianjin, ChinaCollege of Computer and Information Engineering, Tianjin Normal University, Tianjin, ChinaCollege of Computer and Information Engineering, Tianjin Normal University, Tianjin, ChinaCollege of Computer and Information Engineering, Tianjin Normal University, Tianjin, ChinaToo many input features in applications may lead to over-fitting and reduce the performance of the learning algorithm. Moreover, in most cases, each feature containing different information content has different effects on the prediction target. Therefore, a feature selection method for calculating the importance of each feature, called WKNNGAFS, is proposed in this paper. In this method, the genetic algorithm (GA) is adopted to search the optimal weight vector, the value of the ith component of which corresponds to the contribution degree of the ith feature to the classification from a global perspective. Besides, weighted K-nearest neighbors algorithm (WKNN), which takes both the different contributions of nearest neighbors and the different classification ability of each feature into account, is used to determine the target label. To evaluate the effectiveness of the proposed method, nine existing feature selection methods are compared with it on 13 real datasets, including 6 high dimensional microarray datasets. Experimental results demonstrate the method is more effective and can improve classification performance.https://ieeexplore.ieee.org/document/9151875/Feature selectionweighted K-nearest neighborsgenetic algorithmreal coding
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Shuangjie Li Kaixiang Zhang Qianru Chen Shuqin Wang Shaoqiang Zhang
spellingShingle	Shuangjie Li Kaixiang Zhang Qianru Chen Shuqin Wang Shaoqiang Zhang Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm IEEE Access Feature selection weighted K-nearest neighbors genetic algorithm real coding
author_facet	Shuangjie Li Kaixiang Zhang Qianru Chen Shuqin Wang Shaoqiang Zhang
author_sort	Shuangjie Li
title	Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm
title_short	Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm
title_full	Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm
title_fullStr	Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm
title_full_unstemmed	Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm
title_sort	feature selection for high dimensional data using weighted k-nearest neighbors and genetic algorithm
publisher	IEEE
series	IEEE Access
issn	2169-3536
publishDate	2020-01-01
description	Too many input features in applications may lead to over-fitting and reduce the performance of the learning algorithm. Moreover, in most cases, each feature containing different information content has different effects on the prediction target. Therefore, a feature selection method for calculating the importance of each feature, called WKNNGAFS, is proposed in this paper. In this method, the genetic algorithm (GA) is adopted to search the optimal weight vector, the value of the ith component of which corresponds to the contribution degree of the ith feature to the classification from a global perspective. Besides, weighted K-nearest neighbors algorithm (WKNN), which takes both the different contributions of nearest neighbors and the different classification ability of each feature into account, is used to determine the target label. To evaluate the effectiveness of the proposed method, nine existing feature selection methods are compared with it on 13 real datasets, including 6 high dimensional microarray datasets. Experimental results demonstrate the method is more effective and can improve classification performance.
topic	Feature selection weighted K-nearest neighbors genetic algorithm real coding
url	https://ieeexplore.ieee.org/document/9151875/
work_keys_str_mv	AT shuangjieli featureselectionforhighdimensionaldatausingweightedknearestneighborsandgeneticalgorithm AT kaixiangzhang featureselectionforhighdimensionaldatausingweightedknearestneighborsandgeneticalgorithm AT qianruchen featureselectionforhighdimensionaldatausingweightedknearestneighborsandgeneticalgorithm AT shuqinwang featureselectionforhighdimensionaldatausingweightedknearestneighborsandgeneticalgorithm AT shaoqiangzhang featureselectionforhighdimensionaldatausingweightedknearestneighborsandgeneticalgorithm
_version_	1724181409040433152

Feature Selection for High Dimensional Data Using Weighted K-Nearest Neighbors and Genetic Algorithm

Similar Items