Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer

Musa Nur Gabere,1 Mohamed Aly Hussein,1 Mohammad Azhar Aziz2 1Department of Bioinformatics, King Abdullah International Medical Research Center/King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia; 2Colorectal Cancer Research Program, Department of Medical Genomics, King Abd...

Full description

Bibliographic Details
Main Authors: Gabere MN, Hussein MA, Aziz MA
Format: Article
Language:English
Published: Dove Medical Press 2016-06-01
Series:OncoTargets and Therapy
Online Access:https://www.dovepress.com/filtered-selection-coupled-with-support-vector-machines-generate-a-fun-peer-reviewed-article-OTT
id doaj-1765cb538a654e7fa4820dc40655d143
record_format Article
spelling doaj-1765cb538a654e7fa4820dc40655d1432020-11-24T22:44:25ZengDove Medical PressOncoTargets and Therapy1178-69302016-06-012016Issue 13313332527241Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancerGabere MNHussein MAAziz MAMusa Nur Gabere,1 Mohamed Aly Hussein,1 Mohammad Azhar Aziz2 1Department of Bioinformatics, King Abdullah International Medical Research Center/King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia; 2Colorectal Cancer Research Program, Department of Medical Genomics, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia Purpose: There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier.Methods: In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid).Results: The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers.Conclusion: This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. Keywords: colorectal cancer, support vector machines, exon microarray, minimum redundancy maximum relevance, predictive model, pathway analysis, biomarkershttps://www.dovepress.com/filtered-selection-coupled-with-support-vector-machines-generate-a-fun-peer-reviewed-article-OTT
collection DOAJ
language English
format Article
sources DOAJ
author Gabere MN
Hussein MA
Aziz MA
spellingShingle Gabere MN
Hussein MA
Aziz MA
Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
OncoTargets and Therapy
author_facet Gabere MN
Hussein MA
Aziz MA
author_sort Gabere MN
title Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title_short Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title_full Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title_fullStr Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title_full_unstemmed Filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
title_sort filtered selection coupled with support vector machines generate a functionally relevant prediction model for colorectal cancer
publisher Dove Medical Press
series OncoTargets and Therapy
issn 1178-6930
publishDate 2016-06-01
description Musa Nur Gabere,1 Mohamed Aly Hussein,1 Mohammad Azhar Aziz2 1Department of Bioinformatics, King Abdullah International Medical Research Center/King Saud bin Abdulaziz University for Health Sciences, Riyadh, Saudi Arabia; 2Colorectal Cancer Research Program, Department of Medical Genomics, King Abdullah International Medical Research Center, Riyadh, Saudi Arabia Purpose: There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier.Methods: In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid).Results: The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers.Conclusion: This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. Keywords: colorectal cancer, support vector machines, exon microarray, minimum redundancy maximum relevance, predictive model, pathway analysis, biomarkers
url https://www.dovepress.com/filtered-selection-coupled-with-support-vector-machines-generate-a-fun-peer-reviewed-article-OTT
work_keys_str_mv AT gaberemn filteredselectioncoupledwithsupportvectormachinesgenerateafunctionallyrelevantpredictionmodelforcolorectalcancer
AT husseinma filteredselectioncoupledwithsupportvectormachinesgenerateafunctionallyrelevantpredictionmodelforcolorectalcancer
AT azizma filteredselectioncoupledwithsupportvectormachinesgenerateafunctionallyrelevantpredictionmodelforcolorectalcancer
_version_ 1725691842137161728