Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection

Identifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for predictio...

Full description

Bibliographic Details
Main Authors:	Xiuquan Du, Jiaxing Cheng
Format:	Article
Language:	English
Published:	Hindawi Limited 2014-01-01
Series:	BioMed Research International
Online Access:	http://dx.doi.org/10.1155/2014/905951

id	doaj-caf831b3d14d4d288c447d23c876cb40
record_format	Article
spelling	doaj-caf831b3d14d4d288c447d23c876cb402020-11-25T00:19:48ZengHindawi LimitedBioMed Research International2314-61332314-61412014-01-01201410.1155/2014/905951905951Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature SelectionXiuquan Du0Jiaxing Cheng1Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, Anhui 230601, ChinaKey Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, Anhui 230601, ChinaIdentifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX) from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method.http://dx.doi.org/10.1155/2014/905951
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Xiuquan Du Jiaxing Cheng
spellingShingle	Xiuquan Du Jiaxing Cheng Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection BioMed Research International
author_facet	Xiuquan Du Jiaxing Cheng
author_sort	Xiuquan Du
title	Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
title_short	Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
title_full	Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
title_fullStr	Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
title_full_unstemmed	Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
title_sort	identification and analysis of driver missense mutations using rotation forest with feature selection
publisher	Hindawi Limited
series	BioMed Research International
issn	2314-6133 2314-6141
publishDate	2014-01-01
description	Identifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX) from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method.
url	http://dx.doi.org/10.1155/2014/905951
work_keys_str_mv	AT xiuquandu identificationandanalysisofdrivermissensemutationsusingrotationforestwithfeatureselection AT jiaxingcheng identificationandanalysisofdrivermissensemutationsusingrotationforestwithfeatureselection
_version_	1725370068310687744

Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection

Similar Items