Pathological rate matrices: from primates to pathogens

<p>Abstract</p> <p>Background</p> <p>Continuous-time Markov models allow flexible, parametrically succinct descriptions of sequence divergence. Non-reversible forms of these models are more biologically realistic but are challenging to develop. The instantaneous rate ma...

Full description

Bibliographic Details
Main Authors: Knight Rob, Easteal Simon, Yap Von, Schranz Harold W, Huttley Gavin A
Format: Article
Language:English
Published: BMC 2008-12-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/9/550
id doaj-067e2450b0474d1c9cf00ae5857f08a3
record_format Article
spelling doaj-067e2450b0474d1c9cf00ae5857f08a32020-11-25T01:03:10ZengBMCBMC Bioinformatics1471-21052008-12-019155010.1186/1471-2105-9-550Pathological rate matrices: from primates to pathogensKnight RobEasteal SimonYap VonSchranz Harold WHuttley Gavin A<p>Abstract</p> <p>Background</p> <p>Continuous-time Markov models allow flexible, parametrically succinct descriptions of sequence divergence. Non-reversible forms of these models are more biologically realistic but are challenging to develop. The instantaneous rate matrices defined for these models are typically transformed into substitution probability matrices using a matrix exponentiation algorithm that employs eigendecomposition, but this algorithm has characteristic vulnerabilities that lead to significant errors when a rate matrix possesses certain 'pathological' properties. Here we tested whether pathological rate matrices exist in nature, and consider the suitability of different algorithms to their computation.</p> <p>Results</p> <p>We used concatenated protein coding gene alignments from microbial genomes, primate genomes and independent intron alignments from primate genomes. The Taylor series expansion and eigendecomposition matrix exponentiation algorithms were compared to the less widely employed, but more robust, Padé with scaling and squaring algorithm for nucleotide, dinucleotide, codon and trinucleotide rate matrices. Pathological dinucleotide and trinucleotide matrices were evident in the microbial data set, affecting the eigendecomposition and Taylor algorithms respectively. Even using a conservative estimate of matrix error (occurrence of an invalid probability), both Taylor and eigendecomposition algorithms exhibited substantial error rates: ~100% of all exonic trinucleotide matrices were pathological to the Taylor algorithm while ~10% of codon positions 1 and 2 dinucleotide matrices and intronic trinucleotide matrices, and ~30% of codon matrices were pathological to eigendecomposition. The majority of Taylor algorithm errors derived from occurrence of multiple unobserved states. A small number of negative probabilities were detected from the Pad�� algorithm on trinucleotide matrices that were attributable to machine precision. Although the Padé algorithm does not facilitate caching of intermediate results, it was up to 3× faster than eigendecomposition on the same matrices.</p> <p>Conclusion</p> <p>Development of robust software for computing non-reversible dinucleotide, codon and higher evolutionary models requires implementation of the Padé with scaling and squaring algorithm.</p> http://www.biomedcentral.com/1471-2105/9/550
collection DOAJ
language English
format Article
sources DOAJ
author Knight Rob
Easteal Simon
Yap Von
Schranz Harold W
Huttley Gavin A
spellingShingle Knight Rob
Easteal Simon
Yap Von
Schranz Harold W
Huttley Gavin A
Pathological rate matrices: from primates to pathogens
BMC Bioinformatics
author_facet Knight Rob
Easteal Simon
Yap Von
Schranz Harold W
Huttley Gavin A
author_sort Knight Rob
title Pathological rate matrices: from primates to pathogens
title_short Pathological rate matrices: from primates to pathogens
title_full Pathological rate matrices: from primates to pathogens
title_fullStr Pathological rate matrices: from primates to pathogens
title_full_unstemmed Pathological rate matrices: from primates to pathogens
title_sort pathological rate matrices: from primates to pathogens
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2008-12-01
description <p>Abstract</p> <p>Background</p> <p>Continuous-time Markov models allow flexible, parametrically succinct descriptions of sequence divergence. Non-reversible forms of these models are more biologically realistic but are challenging to develop. The instantaneous rate matrices defined for these models are typically transformed into substitution probability matrices using a matrix exponentiation algorithm that employs eigendecomposition, but this algorithm has characteristic vulnerabilities that lead to significant errors when a rate matrix possesses certain 'pathological' properties. Here we tested whether pathological rate matrices exist in nature, and consider the suitability of different algorithms to their computation.</p> <p>Results</p> <p>We used concatenated protein coding gene alignments from microbial genomes, primate genomes and independent intron alignments from primate genomes. The Taylor series expansion and eigendecomposition matrix exponentiation algorithms were compared to the less widely employed, but more robust, Padé with scaling and squaring algorithm for nucleotide, dinucleotide, codon and trinucleotide rate matrices. Pathological dinucleotide and trinucleotide matrices were evident in the microbial data set, affecting the eigendecomposition and Taylor algorithms respectively. Even using a conservative estimate of matrix error (occurrence of an invalid probability), both Taylor and eigendecomposition algorithms exhibited substantial error rates: ~100% of all exonic trinucleotide matrices were pathological to the Taylor algorithm while ~10% of codon positions 1 and 2 dinucleotide matrices and intronic trinucleotide matrices, and ~30% of codon matrices were pathological to eigendecomposition. The majority of Taylor algorithm errors derived from occurrence of multiple unobserved states. A small number of negative probabilities were detected from the Pad�� algorithm on trinucleotide matrices that were attributable to machine precision. Although the Padé algorithm does not facilitate caching of intermediate results, it was up to 3× faster than eigendecomposition on the same matrices.</p> <p>Conclusion</p> <p>Development of robust software for computing non-reversible dinucleotide, codon and higher evolutionary models requires implementation of the Padé with scaling and squaring algorithm.</p>
url http://www.biomedcentral.com/1471-2105/9/550
work_keys_str_mv AT knightrob pathologicalratematricesfromprimatestopathogens
AT eastealsimon pathologicalratematricesfromprimatestopathogens
AT yapvon pathologicalratematricesfromprimatestopathogens
AT schranzharoldw pathologicalratematricesfromprimatestopathogens
AT huttleygavina pathologicalratematricesfromprimatestopathogens
_version_ 1725202050950627328