Comprehensive comparison of graph based multiple protein sequence alignment strategies

<p>Abstract</p> <p>Background</p> <p>Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict mo...

Full description

Bibliographic Details
Main Authors: Plyusnin Ilya, Holm Liisa
Format: Article
Language:English
Published: BMC 2012-04-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/13/64
Description
Summary:<p>Abstract</p> <p>Background</p> <p>Alignment of protein sequences (MPSA) is the starting point for a multitude of applications in molecular biology. Here, we present a novel MPSA program based on the SeqAn sequence alignment library. Our implementation has a strict modular structure, which allows to swap different components of the alignment process and, thus, to investigate their contribution to the alignment quality and computation time. We systematically varied information sources, guiding trees, score transformations and iterative refinement options, and evaluated the resulting alignments on BAliBASE and SABmark.</p> <p>Results</p> <p>Our results indicate the optimal alignment strategy based on the choices compared. First, we show that pairwise global and local alignments contain sufficient information to construct a high quality multiple alignment. Second, single linkage clustering is almost invariably the best algorithm to build a guiding tree for progressive alignment. Third, triplet library extension, with introduction of new edges, is the most efficient consistency transformation of those compared. Alternatively, one can apply tree dependent partitioning as a post processing step, which was shown to be comparable with the best consistency transformation in both time and accuracy. Finally, propagating information beyond four transitive links introduces more noise than signal.</p> <p>Conclusions</p> <p>This is the first time multiple protein alignment strategies are comprehensively and clearly compared using a single implementation platform. In particular, we showed which of the existing consistency transformations and iterative refinement techniques are the most valid. Our implementation is freely available at <url>http://ekhidna.biocenter.helsinki.fi/MMSA</url> and as a supplementary file attached to this article (see Additional file <supplr sid="S1">1</supplr>).</p> <suppl id="S1"> <title> <p>Additional file 1</p> </title> <text> <p><b>Source code, Makefile, installation instructions and test alignments</b>.</p> </text> <file name="1471-2105-13-64-S1.GZ"> <p>Click here for file</p> </file> </suppl>
ISSN:1471-2105