Statistical Algorithms for Long DNA Sequences: Oligonucleotide Distributions and Homogeneity Maps

The statistical properties of oligonucleotide appearances within long DNA sequences often reveal useful characteristics of the corresponding DNA areas. Two algorithms to statistically analyze oligonucleotide appearances within long DNA sequences in genome banks are presented. The first algorithm det...

Full description

Bibliographic Details
Main Authors: P. Katsaloulis, T. Theoharis, A. Provata
Format: Article
Language:English
Published: Hindawi Limited 2005-01-01
Series:Scientific Programming
Online Access:http://dx.doi.org/10.1155/2005/807304
Description
Summary:The statistical properties of oligonucleotide appearances within long DNA sequences often reveal useful characteristics of the corresponding DNA areas. Two algorithms to statistically analyze oligonucleotide appearances within long DNA sequences in genome banks are presented. The first algorithm determines statistical indices for arbitrary length oligonucleotides within arbitrary length DNA sequences. The critical exponent μ of the distance distribution between consecutive occurrences of the same oligonucleotide is calculated and its value is shown to characterize the functionality of the oligonucleotide. The second algorithm searches for areas with variable homogeneity, based on the density of oligonucleotides. The two algorithms have been applied to representative eucaryotes (the animal Mus musculusand the plant Arabidopsis thaliana) and interesting results were obtained, confirmed by biological observations. All programs are open source and publicly available on our web site.
ISSN:1058-9244
1875-919X