uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts

<p>Abstract</p> <p>Background</p> <p>Randomly shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. In many cases, biologists need sophisticated shuffling tools that preserve not only the counts of...

Full description

Bibliographic Details
Main Authors: Gillespie Joel, Anderson James, Jiang Minghui, Mayne Martin
Format: Article
Language:English
Published: BMC 2008-04-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/192
id doaj-96704ac442bc4c23a491560888ea23f0
record_format Article
spelling doaj-96704ac442bc4c23a491560888ea23f02020-11-24T22:10:08ZengBMCBMC Bioinformatics1471-21052008-04-019119210.1186/1471-2105-9-192uShuffle: A useful tool for shuffling biological sequences while preserving the k-let countsGillespie JoelAnderson JamesJiang MinghuiMayne Martin<p>Abstract</p> <p>Background</p> <p>Randomly shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. In many cases, biologists need sophisticated shuffling tools that preserve not only the counts of distinct letters but also higher-order statistics such as doublet counts, triplet counts, and, in general, <it>k</it>-let counts.</p> <p>Results</p> <p>We present a sequence analysis tool (named uShuffle) for generating uniform random permutations of biological sequences (such as DNAs, RNAs, and proteins) that preserve the exact <it>k</it>-let counts. The uShuffle tool implements the latest variant of the Euler algorithm and uses Wilson's algorithm in the crucial step of arborescence generation. It is carefully engineered and extremely efficient. The uShuffle tool achieves maximum flexibility by allowing arbitrary alphabet size and let size. It can be used as a command-line program, a web application, or a utility library. Source code in C, Java, and C#, and integration instructions for Perl and Python are provided.</p> <p>Conclusion</p> <p>The uShuffle tool surpasses existing implementation of the Euler algorithm in both performance and flexibility. It is a useful tool for the bioinformatics community.</p> http://www.biomedcentral.com/1471-2105/9/192
collection DOAJ
language English
format Article
sources DOAJ
author Gillespie Joel
Anderson James
Jiang Minghui
Mayne Martin
spellingShingle Gillespie Joel
Anderson James
Jiang Minghui
Mayne Martin
uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
BMC Bioinformatics
author_facet Gillespie Joel
Anderson James
Jiang Minghui
Mayne Martin
author_sort Gillespie Joel
title uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title_short uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title_full uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title_fullStr uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title_full_unstemmed uShuffle: A useful tool for shuffling biological sequences while preserving the k-let counts
title_sort ushuffle: a useful tool for shuffling biological sequences while preserving the k-let counts
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2008-04-01
description <p>Abstract</p> <p>Background</p> <p>Randomly shuffled sequences are routinely used in sequence analysis to evaluate the statistical significance of a biological sequence. In many cases, biologists need sophisticated shuffling tools that preserve not only the counts of distinct letters but also higher-order statistics such as doublet counts, triplet counts, and, in general, <it>k</it>-let counts.</p> <p>Results</p> <p>We present a sequence analysis tool (named uShuffle) for generating uniform random permutations of biological sequences (such as DNAs, RNAs, and proteins) that preserve the exact <it>k</it>-let counts. The uShuffle tool implements the latest variant of the Euler algorithm and uses Wilson's algorithm in the crucial step of arborescence generation. It is carefully engineered and extremely efficient. The uShuffle tool achieves maximum flexibility by allowing arbitrary alphabet size and let size. It can be used as a command-line program, a web application, or a utility library. Source code in C, Java, and C#, and integration instructions for Perl and Python are provided.</p> <p>Conclusion</p> <p>The uShuffle tool surpasses existing implementation of the Euler algorithm in both performance and flexibility. It is a useful tool for the bioinformatics community.</p>
url http://www.biomedcentral.com/1471-2105/9/192
work_keys_str_mv AT gillespiejoel ushuffleausefultoolforshufflingbiologicalsequenceswhilepreservingthekletcounts
AT andersonjames ushuffleausefultoolforshufflingbiologicalsequenceswhilepreservingthekletcounts
AT jiangminghui ushuffleausefultoolforshufflingbiologicalsequenceswhilepreservingthekletcounts
AT maynemartin ushuffleausefultoolforshufflingbiologicalsequenceswhilepreservingthekletcounts
_version_ 1725809064335638528