Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm
This paper proposes a feature selection method that is effective in distinguishing colorectal cancer patients from normal individuals using K-means clustering and the modified harmony search algorithm. As the genetic cause of colorectal cancer originates from mutations in genes, it is important to c...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
MDPI AG
2021-03-01
|
Series: | Mathematics |
Subjects: | |
Online Access: | https://www.mdpi.com/2227-7390/9/5/570 |
id |
doaj-6f2bbcbb57b2478c96fdb98e2c99a3a4 |
---|---|
record_format |
Article |
spelling |
doaj-6f2bbcbb57b2478c96fdb98e2c99a3a42021-03-08T00:02:30ZengMDPI AGMathematics2227-73902021-03-01957057010.3390/math9050570Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search AlgorithmJin Hee Bae0Minwoo Kim1J.S. Lim2Zong Woo Geem3College of IT Convergence, Gachon University, Seongnam 13120, KoreaCollege of IT Convergence, Gachon University, Seongnam 13120, KoreaCollege of IT Convergence, Gachon University, Seongnam 13120, KoreaCollege of IT Convergence, Gachon University, Seongnam 13120, KoreaThis paper proposes a feature selection method that is effective in distinguishing colorectal cancer patients from normal individuals using K-means clustering and the modified harmony search algorithm. As the genetic cause of colorectal cancer originates from mutations in genes, it is important to classify the presence or absence of colorectal cancer through gene information. The proposed methodology consists of four steps. First, the original data are Z-normalized by data preprocessing. Candidate genes are then selected using the Fisher score. Next, one representative gene is selected from each cluster after candidate genes are clustered using K-means clustering. Finally, feature selection is carried out using the modified harmony search algorithm. The gene combination created by feature selection is then applied to the classification model and verified using 5-fold cross-validation. The proposed model obtained a classification accuracy of up to 94.36%. Furthermore, on comparing the proposed method with other methods, we prove that the proposed method performs well in classifying colorectal cancer. Moreover, we believe that the proposed model can be applied not only to colorectal cancer but also to other gene-related diseases.https://www.mdpi.com/2227-7390/9/5/570feature selectioncolorectal cancergene expressionK-means clusteringmodified harmony search |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jin Hee Bae Minwoo Kim J.S. Lim Zong Woo Geem |
spellingShingle |
Jin Hee Bae Minwoo Kim J.S. Lim Zong Woo Geem Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm Mathematics feature selection colorectal cancer gene expression K-means clustering modified harmony search |
author_facet |
Jin Hee Bae Minwoo Kim J.S. Lim Zong Woo Geem |
author_sort |
Jin Hee Bae |
title |
Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm |
title_short |
Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm |
title_full |
Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm |
title_fullStr |
Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm |
title_full_unstemmed |
Feature Selection for Colon Cancer Detection Using K-Means Clustering and Modified Harmony Search Algorithm |
title_sort |
feature selection for colon cancer detection using k-means clustering and modified harmony search algorithm |
publisher |
MDPI AG |
series |
Mathematics |
issn |
2227-7390 |
publishDate |
2021-03-01 |
description |
This paper proposes a feature selection method that is effective in distinguishing colorectal cancer patients from normal individuals using K-means clustering and the modified harmony search algorithm. As the genetic cause of colorectal cancer originates from mutations in genes, it is important to classify the presence or absence of colorectal cancer through gene information. The proposed methodology consists of four steps. First, the original data are Z-normalized by data preprocessing. Candidate genes are then selected using the Fisher score. Next, one representative gene is selected from each cluster after candidate genes are clustered using K-means clustering. Finally, feature selection is carried out using the modified harmony search algorithm. The gene combination created by feature selection is then applied to the classification model and verified using 5-fold cross-validation. The proposed model obtained a classification accuracy of up to 94.36%. Furthermore, on comparing the proposed method with other methods, we prove that the proposed method performs well in classifying colorectal cancer. Moreover, we believe that the proposed model can be applied not only to colorectal cancer but also to other gene-related diseases. |
topic |
feature selection colorectal cancer gene expression K-means clustering modified harmony search |
url |
https://www.mdpi.com/2227-7390/9/5/570 |
work_keys_str_mv |
AT jinheebae featureselectionforcoloncancerdetectionusingkmeansclusteringandmodifiedharmonysearchalgorithm AT minwookim featureselectionforcoloncancerdetectionusingkmeansclusteringandmodifiedharmonysearchalgorithm AT jslim featureselectionforcoloncancerdetectionusingkmeansclusteringandmodifiedharmonysearchalgorithm AT zongwoogeem featureselectionforcoloncancerdetectionusingkmeansclusteringandmodifiedharmonysearchalgorithm |
_version_ |
1724229195817549824 |