SARP: A Novel Algorithm to Assess Compositional Biases in Protein Sequences

The composition of a defined set of subunits (nucleotides, amino acids) is one of the key features of biological sequences. Compositional biases are local shifts in amino acid or nucleotide frequencies that can occur as an adaptation of an organism to an extreme ecological niche, or as the signature...

Full description

Bibliographic Details
Main Authors: Kirill S. Antonets, Anton A. Nizhnikov
Format: Article
Language:English
Published: SAGE Publishing 2013-01-01
Series:Evolutionary Bioinformatics
Online Access:https://doi.org/10.4137/EBO.S12299
id doaj-71eb3aabcc154f958c10cf46c12124a6
record_format Article
spelling doaj-71eb3aabcc154f958c10cf46c12124a62020-11-25T03:10:45ZengSAGE PublishingEvolutionary Bioinformatics1176-93432013-01-01910.4137/EBO.S12299SARP: A Novel Algorithm to Assess Compositional Biases in Protein SequencesKirill S. Antonets0Anton A. Nizhnikov1Department of Genetics and Biotechnology, St. Petersburg State University, St. Petersburg, Russia.St. Petersburg Branch of N.I. Vavilov institute of General Genetics, Russian Academy of Sciences, St. Petersburg, Russia.The composition of a defined set of subunits (nucleotides, amino acids) is one of the key features of biological sequences. Compositional biases are local shifts in amino acid or nucleotide frequencies that can occur as an adaptation of an organism to an extreme ecological niche, or as the signature of a specific function or localization of the corresponding protein. The calculation of probability is a method for annotating compositional bias and providing accurate detection of biased subsequences. Here, we present a Sequence Analysis based on the Ranking of Probabilities (SARP), a novel algorithm for the annotation of compositional biases based on ranking subsequences by their probabilities. SARP provides the same accuracy as the previously published Lower Probability Subsequences (LPS) algorithm but performs at an approximately 230-fold faster rate. It can be recommended for use when working with large datasets to reduce the time and resources required.https://doi.org/10.4137/EBO.S12299
collection DOAJ
language English
format Article
sources DOAJ
author Kirill S. Antonets
Anton A. Nizhnikov
spellingShingle Kirill S. Antonets
Anton A. Nizhnikov
SARP: A Novel Algorithm to Assess Compositional Biases in Protein Sequences
Evolutionary Bioinformatics
author_facet Kirill S. Antonets
Anton A. Nizhnikov
author_sort Kirill S. Antonets
title SARP: A Novel Algorithm to Assess Compositional Biases in Protein Sequences
title_short SARP: A Novel Algorithm to Assess Compositional Biases in Protein Sequences
title_full SARP: A Novel Algorithm to Assess Compositional Biases in Protein Sequences
title_fullStr SARP: A Novel Algorithm to Assess Compositional Biases in Protein Sequences
title_full_unstemmed SARP: A Novel Algorithm to Assess Compositional Biases in Protein Sequences
title_sort sarp: a novel algorithm to assess compositional biases in protein sequences
publisher SAGE Publishing
series Evolutionary Bioinformatics
issn 1176-9343
publishDate 2013-01-01
description The composition of a defined set of subunits (nucleotides, amino acids) is one of the key features of biological sequences. Compositional biases are local shifts in amino acid or nucleotide frequencies that can occur as an adaptation of an organism to an extreme ecological niche, or as the signature of a specific function or localization of the corresponding protein. The calculation of probability is a method for annotating compositional bias and providing accurate detection of biased subsequences. Here, we present a Sequence Analysis based on the Ranking of Probabilities (SARP), a novel algorithm for the annotation of compositional biases based on ranking subsequences by their probabilities. SARP provides the same accuracy as the previously published Lower Probability Subsequences (LPS) algorithm but performs at an approximately 230-fold faster rate. It can be recommended for use when working with large datasets to reduce the time and resources required.
url https://doi.org/10.4137/EBO.S12299
work_keys_str_mv AT kirillsantonets sarpanovelalgorithmtoassesscompositionalbiasesinproteinsequences
AT antonanizhnikov sarpanovelalgorithmtoassesscompositionalbiasesinproteinsequences
_version_ 1724657497741983744