Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection
Identifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for predictio...
Main Authors: | , |
---|---|
Format: | Article |
Language: | English |
Published: |
Hindawi Limited
2014-01-01
|
Series: | BioMed Research International |
Online Access: | http://dx.doi.org/10.1155/2014/905951 |
id |
doaj-caf831b3d14d4d288c447d23c876cb40 |
---|---|
record_format |
Article |
spelling |
doaj-caf831b3d14d4d288c447d23c876cb402020-11-25T00:19:48ZengHindawi LimitedBioMed Research International2314-61332314-61412014-01-01201410.1155/2014/905951905951Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature SelectionXiuquan Du0Jiaxing Cheng1Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, Anhui 230601, ChinaKey Laboratory of Intelligent Computing and Signal Processing of Ministry of Education, Anhui University, Hefei, Anhui 230601, ChinaIdentifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX) from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method.http://dx.doi.org/10.1155/2014/905951 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Xiuquan Du Jiaxing Cheng |
spellingShingle |
Xiuquan Du Jiaxing Cheng Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection BioMed Research International |
author_facet |
Xiuquan Du Jiaxing Cheng |
author_sort |
Xiuquan Du |
title |
Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection |
title_short |
Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection |
title_full |
Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection |
title_fullStr |
Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection |
title_full_unstemmed |
Identification and Analysis of Driver Missense Mutations Using Rotation Forest with Feature Selection |
title_sort |
identification and analysis of driver missense mutations using rotation forest with feature selection |
publisher |
Hindawi Limited |
series |
BioMed Research International |
issn |
2314-6133 2314-6141 |
publishDate |
2014-01-01 |
description |
Identifying cancer-associated mutations (driver mutations) is critical for understanding the cellular function of cancer genome that leads to activation of oncogenes or inactivation of tumor suppressor genes. Many approaches are proposed which use supervised machine learning techniques for prediction with features obtained by some databases. However, often we do not know which features are important for driver mutations prediction. In this study, we propose a novel feature selection method (called DX) from 126 candidate features’ set. In order to obtain the best performance, rotation forest algorithm was adopted to perform the experiment. On the train dataset which was collected from COSMIC and Swiss-Prot databases, we are able to obtain high prediction performance with 88.03% accuracy, 93.9% precision, and 81.35% recall when the 11 top-ranked features were used. Comparison with other various techniques in the TP53, EGFR, and Cosmic2plus datasets shows the generality of our method. |
url |
http://dx.doi.org/10.1155/2014/905951 |
work_keys_str_mv |
AT xiuquandu identificationandanalysisofdrivermissensemutationsusingrotationforestwithfeatureselection AT jiaxingcheng identificationandanalysisofdrivermissensemutationsusingrotationforestwithfeatureselection |
_version_ |
1725370068310687744 |