Identification of consensus RNA secondary structures using suffix arrays

Abstract Background The identification of a consensus RNA motif often consists in finding a conserved secondary structure with minimum free energy in an ensemble of aligned sequences. However, an alignment is often difficult to obtain without prior stru...

Full description

Bibliographic Details
Main Authors:	Nguyen Truong, Anwar Mohammad, Turcotte Marcel
Format:	Article
Language:	English
Published:	BMC 2006-05-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/7/244

id	doaj-8ed43c42bd6243a88f997df6219bac9a
record_format	Article
spelling	doaj-8ed43c42bd6243a88f997df6219bac9a2020-11-24T21:47:47ZengBMCBMC Bioinformatics1471-21052006-05-017124410.1186/1471-2105-7-244Identification of consensus RNA secondary structures using suffix arraysNguyen TruongAnwar MohammadTurcotte Marcel<p>Abstract</p> <p>Background</p> <p>The identification of a consensus RNA motif often consists in finding a conserved secondary structure with minimum free energy in an ensemble of aligned sequences. However, an alignment is often difficult to obtain without prior structural information. Thus the need for tools to automate this process.</p> <p>Results</p> <p>We present an algorithm called Seed to identify all the conserved RNA secondary structure motifs in a set of unaligned sequences. The search space is defined as the set of all the secondary structure motifs inducible from a seed sequence. A general-to-specific search allows finding all the motifs that are conserved. Suffix arrays are used to enumerate efficiently all the biological palindromes as well as for the matching of RNA secondary structure expressions.</p> <p>We assessed the ability of this approach to uncover known structures using four datasets. The enumeration of the motifs relies only on the secondary structure definition and conservation only, therefore allowing for the independent evaluation of scoring schemes. Twelve simple objective functions based on free energy were evaluated for their potential to discriminate native folds from the rest.</p> <p>Conclusion</p> <p>Our evaluation shows that 1) support and exclusion constraints are sufficient to make an exhaustive search of the secondary structure space feasible. 2) The search space induced from a seed sequence contains known motifs. 3) Simple objective functions, consisting of a combination of the free energy of matching sequences, can generally identify motifs with high positive predictive value and sensitivity to known motifs.</p> http://www.biomedcentral.com/1471-2105/7/244
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Nguyen Truong Anwar Mohammad Turcotte Marcel
spellingShingle	Nguyen Truong Anwar Mohammad Turcotte Marcel Identification of consensus RNA secondary structures using suffix arrays BMC Bioinformatics
author_facet	Nguyen Truong Anwar Mohammad Turcotte Marcel
author_sort	Nguyen Truong
title	Identification of consensus RNA secondary structures using suffix arrays
title_short	Identification of consensus RNA secondary structures using suffix arrays
title_full	Identification of consensus RNA secondary structures using suffix arrays
title_fullStr	Identification of consensus RNA secondary structures using suffix arrays
title_full_unstemmed	Identification of consensus RNA secondary structures using suffix arrays
title_sort	identification of consensus rna secondary structures using suffix arrays
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2006-05-01
description	<p>Abstract</p> <p>Background</p> <p>The identification of a consensus RNA motif often consists in finding a conserved secondary structure with minimum free energy in an ensemble of aligned sequences. However, an alignment is often difficult to obtain without prior structural information. Thus the need for tools to automate this process.</p> <p>Results</p> <p>We present an algorithm called Seed to identify all the conserved RNA secondary structure motifs in a set of unaligned sequences. The search space is defined as the set of all the secondary structure motifs inducible from a seed sequence. A general-to-specific search allows finding all the motifs that are conserved. Suffix arrays are used to enumerate efficiently all the biological palindromes as well as for the matching of RNA secondary structure expressions.</p> <p>We assessed the ability of this approach to uncover known structures using four datasets. The enumeration of the motifs relies only on the secondary structure definition and conservation only, therefore allowing for the independent evaluation of scoring schemes. Twelve simple objective functions based on free energy were evaluated for their potential to discriminate native folds from the rest.</p> <p>Conclusion</p> <p>Our evaluation shows that 1) support and exclusion constraints are sufficient to make an exhaustive search of the secondary structure space feasible. 2) The search space induced from a seed sequence contains known motifs. 3) Simple objective functions, consisting of a combination of the free energy of matching sequences, can generally identify motifs with high positive predictive value and sensitivity to known motifs.</p>
url	http://www.biomedcentral.com/1471-2105/7/244
work_keys_str_mv	AT nguyentruong identificationofconsensusrnasecondarystructuresusingsuffixarrays AT anwarmohammad identificationofconsensusrnasecondarystructuresusingsuffixarrays AT turcottemarcel identificationofconsensusrnasecondarystructuresusingsuffixarrays
_version_	1725895560982953984

Identification of consensus RNA secondary structures using suffix arrays

Similar Items