Feature selection for splice site prediction: A new method using EDA-based feature ranking

<p>Abstract</p> <p>Background</p> <p>The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the clas...

Full description

Bibliographic Details
Main Authors: Rouzé Pierre, Aeyels Dirk, Degroeve Sven, Saeys Yvan, Van de Peer Yves
Format: Article
Language:English
Published: BMC 2004-05-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/5/64
id doaj-403a53fcb00345e787dcb3372347f17c
record_format Article
spelling doaj-403a53fcb00345e787dcb3372347f17c2020-11-25T00:25:33ZengBMCBMC Bioinformatics1471-21052004-05-01516410.1186/1471-2105-5-64Feature selection for splice site prediction: A new method using EDA-based feature rankingRouzé PierreAeyels DirkDegroeve SvenSaeys YvanVan de Peer Yves<p>Abstract</p> <p>Background</p> <p>The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data.</p> <p>Results</p> <p>In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing.</p> <p>Conclusion</p> <p>We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features.</p> http://www.biomedcentral.com/1471-2105/5/64
collection DOAJ
language English
format Article
sources DOAJ
author Rouzé Pierre
Aeyels Dirk
Degroeve Sven
Saeys Yvan
Van de Peer Yves
spellingShingle Rouzé Pierre
Aeyels Dirk
Degroeve Sven
Saeys Yvan
Van de Peer Yves
Feature selection for splice site prediction: A new method using EDA-based feature ranking
BMC Bioinformatics
author_facet Rouzé Pierre
Aeyels Dirk
Degroeve Sven
Saeys Yvan
Van de Peer Yves
author_sort Rouzé Pierre
title Feature selection for splice site prediction: A new method using EDA-based feature ranking
title_short Feature selection for splice site prediction: A new method using EDA-based feature ranking
title_full Feature selection for splice site prediction: A new method using EDA-based feature ranking
title_fullStr Feature selection for splice site prediction: A new method using EDA-based feature ranking
title_full_unstemmed Feature selection for splice site prediction: A new method using EDA-based feature ranking
title_sort feature selection for splice site prediction: a new method using eda-based feature ranking
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2004-05-01
description <p>Abstract</p> <p>Background</p> <p>The identification of relevant biological features in large and complex datasets is an important step towards gaining insight in the processes underlying the data. Other advantages of feature selection include the ability of the classification system to attain good or even better solutions using a restricted subset of features, and a faster classification. Thus, robust methods for fast feature selection are of key importance in extracting knowledge from complex biological data.</p> <p>Results</p> <p>In this paper we present a novel method for feature subset selection applied to splice site prediction, based on estimation of distribution algorithms, a more general framework of genetic algorithms. From the estimated distribution of the algorithm, a feature ranking is derived. Afterwards this ranking is used to iteratively discard features. We apply this technique to the problem of splice site prediction, and show how it can be used to gain insight into the underlying biological process of splicing.</p> <p>Conclusion</p> <p>We show that this technique proves to be more robust than the traditional use of estimation of distribution algorithms for feature selection: instead of returning a single best subset of features (as they normally do) this method provides a dynamical view of the feature selection process, like the traditional sequential wrapper methods. However, the method is faster than the traditional techniques, and scales better to datasets described by a large number of features.</p>
url http://www.biomedcentral.com/1471-2105/5/64
work_keys_str_mv AT rouzepierre featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking
AT aeyelsdirk featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking
AT degroevesven featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking
AT saeysyvan featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking
AT vandepeeryves featureselectionforsplicesitepredictionanewmethodusingedabasedfeatureranking
_version_ 1725348304046260224