Computing distribution of scale independent motifs in biological sequences

<p>Abstract</p> <p>The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the invest...

Full description

Bibliographic Details
Main Authors: Vinga Susana, Almeida Jonas S
Format: Article
Language:English
Published: BMC 2006-10-01
Series:Algorithms for Molecular Biology
Online Access:http://www.almob.org/content/1/1/18
id doaj-cdfccd1b26644d72827b21639211eb61
record_format Article
spelling doaj-cdfccd1b26644d72827b21639211eb612020-11-25T01:58:20ZengBMCAlgorithms for Molecular Biology1748-71882006-10-01111810.1186/1748-7188-1-18Computing distribution of scale independent motifs in biological sequencesVinga SusanaAlmeida Jonas S<p>Abstract</p> <p>The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.</p> http://www.almob.org/content/1/1/18
collection DOAJ
language English
format Article
sources DOAJ
author Vinga Susana
Almeida Jonas S
spellingShingle Vinga Susana
Almeida Jonas S
Computing distribution of scale independent motifs in biological sequences
Algorithms for Molecular Biology
author_facet Vinga Susana
Almeida Jonas S
author_sort Vinga Susana
title Computing distribution of scale independent motifs in biological sequences
title_short Computing distribution of scale independent motifs in biological sequences
title_full Computing distribution of scale independent motifs in biological sequences
title_fullStr Computing distribution of scale independent motifs in biological sequences
title_full_unstemmed Computing distribution of scale independent motifs in biological sequences
title_sort computing distribution of scale independent motifs in biological sequences
publisher BMC
series Algorithms for Molecular Biology
issn 1748-7188
publishDate 2006-10-01
description <p>Abstract</p> <p>The use of Chaos Game Representation (CGR) or its generalization, Universal Sequence Maps (USM), to describe the distribution of biological sequences has been found objectionable because of the fractal structure of that coordinate system. Consequently, the investigation of distribution of symbolic motifs at multiple scales is hampered by an inexact association between distance and sequence dissimilarity. A solution to this problem could unleash the use of iterative maps as phase-state representation of sequences where its statistical properties can be conveniently investigated. In this study a family of kernel density functions is described that accommodates the fractal nature of iterative function representations of symbolic sequences and, consequently, enables the exact investigation of sequence motifs of arbitrary lengths in that scale-independent representation. Furthermore, the proposed kernel density includes both Markovian succession and currently used alignment-free sequence dissimilarity metrics as special solutions. Therefore, the fractal kernel described is in fact a generalization that provides a common framework for a diverse suite of sequence analysis techniques.</p>
url http://www.almob.org/content/1/1/18
work_keys_str_mv AT vingasusana computingdistributionofscaleindependentmotifsinbiologicalsequences
AT almeidajonass computingdistributionofscaleindependentmotifsinbiologicalsequences
_version_ 1724970280386822144