PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination

<p>Abstract</p> <p>Background</p> <p>We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cos...

Full description

Bibliographic Details
Main Authors: Stamatakis Alexandros, Hodkinson Brendan P, Lücking Robert, Cartwright Reed A
Format: Article
Language:English
Published: BMC 2011-01-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/12/10
id doaj-ee1fa272969f430ca13fd9b0d01c9bc6
record_format Article
spelling doaj-ee1fa272969f430ca13fd9b0d01c9bc62020-11-25T00:17:07ZengBMCBMC Bioinformatics1471-21052011-01-011211010.1186/1471-2105-12-10PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordinationStamatakis AlexandrosHodkinson Brendan PLücking RobertCartwright Reed A<p>Abstract</p> <p>Background</p> <p>We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cost scores matrices by means of Principal Coordinates Analysis (PCoA). After identification of ambiguous regions, the method computes pairwise distances as sequence identities or cost scores, ordinates the resulting distance matrix by means of PCoA, and encodes the principal coordinates as ordered integers. Three biological and 100 simulated datasets were used to assess the performance of the new method.</p> <p>Results</p> <p>Including ambiguous regions coded by means of PICS-Ord increased topological accuracy, resolution, and bootstrap support in real biological and simulated datasets compared to the alternative of excluding such regions from the analysis a priori. In terms of accuracy, PICS-Ord performs equal to or better than previously available methods of ambiguous region coding (e.g., INAASE), with the advantage of a practically unlimited alignment size and increased analytical speed and the possibility of PICS-Ord scores to be analyzed together with DNA data in a partitioned maximum likelihood model.</p> <p>Conclusions</p> <p>Advantages of PICS-Ord over step matrix-based ambiguous region coding with INAASE include a practically unlimited number of OTUs and seamless integration of PICS-Ord codes into phylogenetic datasets, as well as the increased speed of phylogenetic analysis. Contrary to word- and frequency-based methods, PICS-Ord maintains the advantage of pairwise sequence alignment to derive distances, and the method is flexible with respect to the calculation of distance scores. In addition to distance and maximum parsimony, PICS-Ord codes can be analyzed in a Bayesian or maximum likelihood framework. RAxML (version 7.2.6 or higher that was developed for this study) allows up to 32-state ordered or unordered characters. A GTR, MK, or ORDERED model can be applied to analyse the PICS-Ord codes partition, with GTR performing slightly better than MK and ORDERED.</p> <p>Availability</p> <p>An implementation of the PICS-Ord algorithm is available from <url>http://scit.us/projects/ngila/wiki/PICS-Ord</url>. It requires both the statistical software, R <url>http://www.r-project.org</url> and the alignment software Ngila <url>http://scit.us/projects/ngila</url>.</p> http://www.biomedcentral.com/1471-2105/12/10
collection DOAJ
language English
format Article
sources DOAJ
author Stamatakis Alexandros
Hodkinson Brendan P
Lücking Robert
Cartwright Reed A
spellingShingle Stamatakis Alexandros
Hodkinson Brendan P
Lücking Robert
Cartwright Reed A
PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
BMC Bioinformatics
author_facet Stamatakis Alexandros
Hodkinson Brendan P
Lücking Robert
Cartwright Reed A
author_sort Stamatakis Alexandros
title PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title_short PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title_full PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title_fullStr PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title_full_unstemmed PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
title_sort pics-ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2011-01-01
description <p>Abstract</p> <p>Background</p> <p>We present a novel method to encode ambiguously aligned regions in fixed multiple sequence alignments by 'Pairwise Identity and Cost Scores Ordination' (PICS-Ord). The method works via ordination of sequence identity or cost scores matrices by means of Principal Coordinates Analysis (PCoA). After identification of ambiguous regions, the method computes pairwise distances as sequence identities or cost scores, ordinates the resulting distance matrix by means of PCoA, and encodes the principal coordinates as ordered integers. Three biological and 100 simulated datasets were used to assess the performance of the new method.</p> <p>Results</p> <p>Including ambiguous regions coded by means of PICS-Ord increased topological accuracy, resolution, and bootstrap support in real biological and simulated datasets compared to the alternative of excluding such regions from the analysis a priori. In terms of accuracy, PICS-Ord performs equal to or better than previously available methods of ambiguous region coding (e.g., INAASE), with the advantage of a practically unlimited alignment size and increased analytical speed and the possibility of PICS-Ord scores to be analyzed together with DNA data in a partitioned maximum likelihood model.</p> <p>Conclusions</p> <p>Advantages of PICS-Ord over step matrix-based ambiguous region coding with INAASE include a practically unlimited number of OTUs and seamless integration of PICS-Ord codes into phylogenetic datasets, as well as the increased speed of phylogenetic analysis. Contrary to word- and frequency-based methods, PICS-Ord maintains the advantage of pairwise sequence alignment to derive distances, and the method is flexible with respect to the calculation of distance scores. In addition to distance and maximum parsimony, PICS-Ord codes can be analyzed in a Bayesian or maximum likelihood framework. RAxML (version 7.2.6 or higher that was developed for this study) allows up to 32-state ordered or unordered characters. A GTR, MK, or ORDERED model can be applied to analyse the PICS-Ord codes partition, with GTR performing slightly better than MK and ORDERED.</p> <p>Availability</p> <p>An implementation of the PICS-Ord algorithm is available from <url>http://scit.us/projects/ngila/wiki/PICS-Ord</url>. It requires both the statistical software, R <url>http://www.r-project.org</url> and the alignment software Ngila <url>http://scit.us/projects/ngila</url>.</p>
url http://www.biomedcentral.com/1471-2105/12/10
work_keys_str_mv AT stamatakisalexandros picsordunlimitedcodingofambiguousregionsbypairwiseidentityandcostscoresordination
AT hodkinsonbrendanp picsordunlimitedcodingofambiguousregionsbypairwiseidentityandcostscoresordination
AT luckingrobert picsordunlimitedcodingofambiguousregionsbypairwiseidentityandcostscoresordination
AT cartwrightreeda picsordunlimitedcodingofambiguousregionsbypairwiseidentityandcostscoresordination
_version_ 1725380936267202560