A novel gene selection algorithm for cancer classification using microarray datasets

Abstract Background Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fa...

Full description

Bibliographic Details
Main Authors: Russul Alanni, Jingyu Hou, Hasseeb Azzawi, Yong Xiang
Format: Article
Language:English
Published: BMC 2019-01-01
Series:BMC Medical Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12920-018-0447-6
id doaj-3c6bb24ea8ce4fc9905684d8a663efc3
record_format Article
spelling doaj-3c6bb24ea8ce4fc9905684d8a663efc32021-04-02T12:48:13ZengBMCBMC Medical Genomics1755-87942019-01-0112111210.1186/s12920-018-0447-6A novel gene selection algorithm for cancer classification using microarray datasetsRussul Alanni0Jingyu Hou1Hasseeb Azzawi2Yong Xiang3School of Information Technology, Deakin UniversitySchool of Information Technology, Deakin UniversitySchool of Information Technology, Deakin UniversitySchool of Information Technology, Deakin UniversityAbstract Background Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results. Methods An innovative Gene Selection Programming (GSP) method is proposed to select relevant genes for effective and efficient cancer classification. GSP is based on Gene Expression Programming (GEP) method with a new defined population initialization algorithm, a new fitness function definition, and improved mutation and recombination operators. . Support Vector Machine (SVM) with a linear kernel serves as a classifier of the GSP. Results Experimental results on ten microarray cancer datasets demonstrate that Gene Selection Programming (GSP) is effective and efficient in eliminating irrelevant and redundant genes/features from microarray datasets. The comprehensive evaluations and comparisons with other methods show that GSP gives a better compromise in terms of all three evaluation criteria, i.e., classification accuracy, number of selected genes, and computational cost. The gene set selected by GSP has shown its superior performances in cancer classification compared to those selected by the up-to-date representative gene selection methods. Conclusion Gene subset selected by GSP can achieve a higher classification accuracy with less processing time.http://link.springer.com/article/10.1186/s12920-018-0447-6Gene selectionGene expression programmingSupport vector machineMicroarray cancer dataset
collection DOAJ
language English
format Article
sources DOAJ
author Russul Alanni
Jingyu Hou
Hasseeb Azzawi
Yong Xiang
spellingShingle Russul Alanni
Jingyu Hou
Hasseeb Azzawi
Yong Xiang
A novel gene selection algorithm for cancer classification using microarray datasets
BMC Medical Genomics
Gene selection
Gene expression programming
Support vector machine
Microarray cancer dataset
author_facet Russul Alanni
Jingyu Hou
Hasseeb Azzawi
Yong Xiang
author_sort Russul Alanni
title A novel gene selection algorithm for cancer classification using microarray datasets
title_short A novel gene selection algorithm for cancer classification using microarray datasets
title_full A novel gene selection algorithm for cancer classification using microarray datasets
title_fullStr A novel gene selection algorithm for cancer classification using microarray datasets
title_full_unstemmed A novel gene selection algorithm for cancer classification using microarray datasets
title_sort novel gene selection algorithm for cancer classification using microarray datasets
publisher BMC
series BMC Medical Genomics
issn 1755-8794
publishDate 2019-01-01
description Abstract Background Microarray datasets are an important medical diagnostic tool as they represent the states of a cell at the molecular level. Available microarray datasets for classifying cancer types generally have a fairly small sample size compared to the large number of genes involved. This fact is known as a curse of dimensionality, which is a challenging problem. Gene selection is a promising approach that addresses this problem and plays an important role in the development of efficient cancer classification due to the fact that only a small number of genes are related to the classification problem. Gene selection addresses many problems in microarray datasets such as reducing the number of irrelevant and noisy genes, and selecting the most related genes to improve the classification results. Methods An innovative Gene Selection Programming (GSP) method is proposed to select relevant genes for effective and efficient cancer classification. GSP is based on Gene Expression Programming (GEP) method with a new defined population initialization algorithm, a new fitness function definition, and improved mutation and recombination operators. . Support Vector Machine (SVM) with a linear kernel serves as a classifier of the GSP. Results Experimental results on ten microarray cancer datasets demonstrate that Gene Selection Programming (GSP) is effective and efficient in eliminating irrelevant and redundant genes/features from microarray datasets. The comprehensive evaluations and comparisons with other methods show that GSP gives a better compromise in terms of all three evaluation criteria, i.e., classification accuracy, number of selected genes, and computational cost. The gene set selected by GSP has shown its superior performances in cancer classification compared to those selected by the up-to-date representative gene selection methods. Conclusion Gene subset selected by GSP can achieve a higher classification accuracy with less processing time.
topic Gene selection
Gene expression programming
Support vector machine
Microarray cancer dataset
url http://link.springer.com/article/10.1186/s12920-018-0447-6
work_keys_str_mv AT russulalanni anovelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT jingyuhou anovelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT hasseebazzawi anovelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT yongxiang anovelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT russulalanni novelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT jingyuhou novelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT hasseebazzawi novelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
AT yongxiang novelgeneselectionalgorithmforcancerclassificationusingmicroarraydatasets
_version_ 1721567520569163776