Feature selection for splice site prediction: A new method using EDA-based feature ranking
<p>Abstract</p> <p>Background</p> <p>The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the clas...
Main Authors: | , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2004-05-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/5/64 |
id |
doaj-403a53fcb00345e787dcb3372347f17c |
---|---|
record_format |
Article |
spelling |
doaj-403a53fcb00345e787dcb3372347f17c2020-11-25T00:25:33ZengBMCBMC Bioinformatics1471-21052004-05-01516410.1186/1471-2105-5-64Feature selection for splice site prediction: A new method using EDA-based feature rankingRouzé PierreAeyels DirkDegroeve SvenSaeys YvanVan de Peer Yves<p>Abstract</p> <p>Background</p> <p>The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data.</p> <p>Results</p> <p>In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing.</p> <p>Conclusion</p> <p>We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features.</p> http://www.biomedcentral.com/1471-2105/5/64 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Rouzé Pierre Aeyels Dirk Degroeve Sven Saeys Yvan Van de Peer Yves |
spellingShingle |
Rouzé Pierre Aeyels Dirk Degroeve Sven Saeys Yvan Van de Peer Yves Feature selection for splice site prediction: A new method using EDA-based feature ranking BMC Bioinformatics |
author_facet |
Rouzé Pierre Aeyels Dirk Degroeve Sven Saeys Yvan Van de Peer Yves |
author_sort |
Rouzé Pierre |
title |
Feature selection for splice site prediction: A new method using EDA-based feature ranking |
title_short |
Feature selection for splice site prediction: A new method using EDA-based feature ranking |
title_full |
Feature selection for splice site prediction: A new method using EDA-based feature ranking |
title_fullStr |
Feature selection for splice site prediction: A new method using EDA-based feature ranking |
title_full_unstemmed |
Feature selection for splice site prediction: A new method using EDA-based feature ranking |
title_sort |
feature selection for splice site prediction: a new method using eda-based feature ranking |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2004-05-01 |
description |
<p>Abstract</p> <p>Background</p> <p>The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data.</p> <p>Results</p> <p>In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing.</p> <p>Conclusion</p> <p>We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features.</p> |
url |
http://www.biomedcentral.com/1471-2105/5/64 |
work_keys_str_mv |
AT rouzepierre featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking AT aeyelsdirk featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking AT degroevesven featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking AT saeysyvan featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking AT vandepeeryves featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking |
_version_ |
1725348304046260224 |