Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification

It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in t...

Full description

Bibliographic Details
Main Authors: Lingyun Gao, Mingquan Ye, Xiaojie Lu, Daobin Huang
Format: Article
Language:English
Published: Elsevier 2017-12-01
Series:Genomics, Proteomics & Bioinformatics
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S1672022917301675
id doaj-f93d6a9299fe47b4a8f99e6eba13114d
record_format Article
spelling doaj-f93d6a9299fe47b4a8f99e6eba13114d2020-11-25T00:59:19ZengElsevierGenomics, Proteomics & Bioinformatics1672-02292017-12-0115638939510.1016/j.gpb.2017.08.002Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer ClassificationLingyun GaoMingquan YeXiaojie LuDaobin HuangIt remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYL9, and GUCA2B.http://www.sciencedirect.com/science/article/pii/S1672022917301675Gene selectionCancer classificationInformation gainSupport vector machineSmall sample size with high dimension
collection DOAJ
language English
format Article
sources DOAJ
author Lingyun Gao
Mingquan Ye
Xiaojie Lu
Daobin Huang
spellingShingle Lingyun Gao
Mingquan Ye
Xiaojie Lu
Daobin Huang
Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
Genomics, Proteomics & Bioinformatics
Gene selection
Cancer classification
Information gain
Support vector machine
Small sample size with high dimension
author_facet Lingyun Gao
Mingquan Ye
Xiaojie Lu
Daobin Huang
author_sort Lingyun Gao
title Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title_short Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title_full Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title_fullStr Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title_full_unstemmed Hybrid Method Based on Information Gain and Support Vector Machine for Gene Selection in Cancer Classification
title_sort hybrid method based on information gain and support vector machine for gene selection in cancer classification
publisher Elsevier
series Genomics, Proteomics & Bioinformatics
issn 1672-0229
publishDate 2017-12-01
description It remains a great challenge to achieve sufficient cancer classification accuracy with the entire set of genes, due to the high dimensions, small sample size, and big noise of gene expression data. We thus proposed a hybrid gene selection method, Information Gain-Support Vector Machine (IG-SVM) in this study. IG was initially employed to filter irrelevant and redundant genes. Then, further removal of redundant genes was performed using SVM to eliminate the noise in the datasets more effectively. Finally, the informative genes selected by IG-SVM served as the input for the LIBSVM classifier. Compared to other related algorithms, IG-SVM showed the highest classification accuracy and superior performance as evaluated using five cancer gene expression datasets based on a few selected genes. As an example, IG-SVM achieved a classification accuracy of 90.32% for colon cancer, which is difficult to be accurately classified, only based on three genes including CSRP1, MYL9, and GUCA2B.
topic Gene selection
Cancer classification
Information gain
Support vector machine
Small sample size with high dimension
url http://www.sciencedirect.com/science/article/pii/S1672022917301675
work_keys_str_mv AT lingyungao hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification
AT mingquanye hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification
AT xiaojielu hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification
AT daobinhuang hybridmethodbasedoninformationgainandsupportvectormachineforgeneselectionincancerclassification
_version_ 1725218034668273664