Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in t...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Elsevier
2017-12-01
|
Series: | Genomics, Proteomics & Bioinformatics |
Subjects: | |
Online Access: | http://www.sciencedirect.com/science/article/pii/S1672022917301675 |
id |
doaj-f93d6a9299fe47b4a8f99e6eba13114d |
---|---|
record_format |
Article |
spelling |
doaj-f93d6a9299fe47b4a8f99e6eba13114d2020-11-25T00:59:19ZengElsevierGenomics, Proteomics & Bioinformatics1672-02292017-12-0115638939510.1016/j.gpb.2017.08.002Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer ClassificationLingyun GaoMingquan YeXiaojie LuDaobin HuangIt remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYL9, and GUCA2B.http://www.sciencedirect.com/science/article/pii/S1672022917301675Gene selectionCancer classificationInformation gainSupport vector machineSmall sample size with high dimension |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Lingyun Gao Mingquan Ye Xiaojie Lu Daobin Huang |
spellingShingle |
Lingyun Gao Mingquan Ye Xiaojie Lu Daobin Huang Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification Genomics, Proteomics & Bioinformatics Gene selection Cancer classification Information gain Support vector machine Small sample size with high dimension |
author_facet |
Lingyun Gao Mingquan Ye Xiaojie Lu Daobin Huang |
author_sort |
Lingyun Gao |
title |
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification |
title_short |
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification |
title_full |
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification |
title_fullStr |
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification |
title_full_unstemmed |
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification |
title_sort |
hybrid method based on information gain and support vector machine for gene selection in cancer classification |
publisher |
Elsevier |
series |
Genomics, Proteomics & Bioinformatics |
issn |
1672-0229 |
publishDate |
2017-12-01 |
description |
It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYL9, and GUCA2B. |
topic |
Gene selection Cancer classification Information gain Support vector machine Small sample size with high dimension |
url |
http://www.sciencedirect.com/science/article/pii/S1672022917301675 |
work_keys_str_mv |
AT lingyungao hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification AT mingquanye hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification AT xiaojielu hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification AT daobinhuang hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification |
_version_ |
1725218034668273664 |