Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection

Identifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for predictio...

Full description

Bibliographic Details
Main Authors: Xiuquan Du, Jiaxing Cheng
Format: Article
Language:English
Published: Hindawi Limited 2014-01-01
Series:BioMed Research International
Online Access:http://dx.doi.org/10.1155/2014/905951
id doaj-caf831b3d14d4d288c447d23c876cb40
record_format Article
spelling doaj-caf831b3d14d4d288c447d23c876cb402020-11-25T00:19:48ZengHindawi LimitedBioMed Research International2314-61332314-61412014-01-01201410.1155/2014/905951905951Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature SelectionXiuquan Du0Jiaxing Cheng1Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, Anhui 230601, ChinaKey Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, Anhui 230601, ChinaIdentifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX) from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method.http://dx.doi.org/10.1155/2014/905951
collection DOAJ
language English
format Article
sources DOAJ
author Xiuquan Du
Jiaxing Cheng
spellingShingle Xiuquan Du
Jiaxing Cheng
Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
BioMed Research International
author_facet Xiuquan Du
Jiaxing Cheng
author_sort Xiuquan Du
title Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
title_short Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
title_full Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
title_fullStr Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
title_full_unstemmed Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
title_sort identification and analysis of driver missense mutations using rotation forest with feature selection
publisher Hindawi Limited
series BioMed Research International
issn 2314-6133
2314-6141
publishDate 2014-01-01
description Identifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX) from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method.
url http://dx.doi.org/10.1155/2014/905951
work_keys_str_mv AT xiuquandu identificationandanalysisofdrivermissensemutationsusingrotationforestwithfeatureselection
AT jiaxingcheng identificationandanalysisofdrivermissensemutationsusingrotationforestwithfeatureselection
_version_ 1725370068310687744