Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm

This paper proposes a feature selection method that is effective in distinguishing colorectal cancer patients from normal individuals using K-means clustering and the modified harmony search algorithm. As the genetic cause of colorectal cancer originates from mutations in genes, it is important to c...

Full description

Bibliographic Details
Main Authors: Jin Hee Bae, Minwoo Kim, J.S. Lim, Zong Woo Geem
Format: Article
Language:English
Published: MDPI AG 2021-03-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/9/5/570
id doaj-6f2bbcbb57b2478c96fdb98e2c99a3a4
record_format Article
spelling doaj-6f2bbcbb57b2478c96fdb98e2c99a3a42021-03-08T00:02:30ZengMDPI AGMathematics2227-73902021-03-01957057010.3390/math9050570Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search AlgorithmJin Hee Bae0Minwoo Kim1J.S. Lim2Zong Woo Geem3College of IT Convergence, Gachon University, Seongnam 13120, KoreaCollege of IT Convergence, Gachon University, Seongnam 13120, KoreaCollege of IT Convergence, Gachon University, Seongnam 13120, KoreaCollege of IT Convergence, Gachon University, Seongnam 13120, KoreaThis paper proposes a feature selection method that is effective in distinguishing colorectal cancer patients from normal individuals using K-means clustering and the modified harmony search algorithm. As the genetic cause of colorectal cancer originates from mutations in genes, it is important to classify the presence or absence of colorectal cancer through gene information. The proposed methodology consists of four steps. First, the original data are Z-normalized by data preprocessing. Candidate genes are then selected using the Fisher score. Next, one representative gene is selected from each cluster after candidate genes are clustered using K-means clustering. Finally, feature selection is carried out using the modified harmony search algorithm. The gene combination created by feature selection is then applied to the classification model and verified using 5-fold cross-validation. The proposed model obtained a classification accuracy of up to 94.36%. Furthermore, on comparing the proposed method with other methods, we prove that the proposed method performs well in classifying colorectal cancer. Moreover, we believe that the proposed model can be applied not only to colorectal cancer but also to other gene-related diseases.https://www.mdpi.com/2227-7390/9/5/570feature selectioncolorectal cancergene expressionK-means clusteringmodified harmony search
collection DOAJ
language English
format Article
sources DOAJ
author Jin Hee Bae
Minwoo Kim
J.S. Lim
Zong Woo Geem
spellingShingle Jin Hee Bae
Minwoo Kim
J.S. Lim
Zong Woo Geem
Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm
Mathematics
feature selection
colorectal cancer
gene expression
K-means clustering
modified harmony search
author_facet Jin Hee Bae
Minwoo Kim
J.S. Lim
Zong Woo Geem
author_sort Jin Hee Bae
title Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm
title_short Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm
title_full Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm
title_fullStr Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm
title_full_unstemmed Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm
title_sort feature selection for colon cancer detection using k-means clustering and modified harmony search algorithm
publisher MDPI AG
series Mathematics
issn 2227-7390
publishDate 2021-03-01
description This paper proposes a feature selection method that is effective in distinguishing colorectal cancer patients from normal individuals using K-means clustering and the modified harmony search algorithm. As the genetic cause of colorectal cancer originates from mutations in genes, it is important to classify the presence or absence of colorectal cancer through gene information. The proposed methodology consists of four steps. First, the original data are Z-normalized by data preprocessing. Candidate genes are then selected using the Fisher score. Next, one representative gene is selected from each cluster after candidate genes are clustered using K-means clustering. Finally, feature selection is carried out using the modified harmony search algorithm. The gene combination created by feature selection is then applied to the classification model and verified using 5-fold cross-validation. The proposed model obtained a classification accuracy of up to 94.36%. Furthermore, on comparing the proposed method with other methods, we prove that the proposed method performs well in classifying colorectal cancer. Moreover, we believe that the proposed model can be applied not only to colorectal cancer but also to other gene-related diseases.
topic feature selection
colorectal cancer
gene expression
K-means clustering
modified harmony search
url https://www.mdpi.com/2227-7390/9/5/570
work_keys_str_mv AT jinheebae featureselectionforcoloncancerdetectionusingkmeansclusteringandmodifiedharmonysearchalgorithm
AT minwookim featureselectionforcoloncancerdetectionusingkmeansclusteringandmodifiedharmonysearchalgorithm
AT jslim featureselectionforcoloncancerdetectionusingkmeansclusteringandmodifiedharmonysearchalgorithm
AT zongwoogeem featureselectionforcoloncancerdetectionusingkmeansclusteringandmodifiedharmonysearchalgorithm
_version_ 1724229195817549824