Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model

Abstract Background Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to select...

Full description

Bibliographic Details
Main Authors:	Liu Jun S, Neuwald Andrew F
Format:	Article
Language:	English
Published:	BMC 2004-10-01
Series:	BMC Bioinformatics
Online Access:	http://www.biomedcentral.com/1471-2105/5/157

id	doaj-5d85b558e2a54df3bed6f1a920d6d031
record_format	Article
spelling	doaj-5d85b558e2a54df3bed6f1a920d6d0312020-11-24T23:15:51ZengBMCBMC Bioinformatics1471-21052004-10-015115710.1186/1471-2105-5-157Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov modelLiu Jun SNeuwald Andrew F<p>Abstract</p> <p>Background</p> <p>Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to selective constraints maintaining important but unknown structural mechanisms with some constraints specific to each family and others shared by a larger subset or by the entire superfamily. To exploit these patterns as a source of functional information, we recently devised a statistically based approach called contrast hierarchical alignment and interaction network (CHAIN) analysis, which infers the strengths of various categories of selective constraints from co-conserved patterns in a multiple alignment. The power of this approach strongly depends on the quality of the multiple alignments, which thus motivated development of theoretical concepts and strategies to improve alignment of conserved motifs within large sets of distantly related sequences.</p> <p>Results</p> <p>Here we describe a hidden Markov model (HMM), an algebraic system, and Markov chain Monte Carlo (MCMC) sampling strategies for alignment of multiple sequence motifs. The MCMC sampling strategies are useful both for alignment optimization and for adjusting position specific background amino acid frequencies for alignment uncertainties. Associated statistical formulations provide an objective measure of alignment quality as well as automatic gap penalty optimization. Improved alignments obtained in this way are compared with PSI-BLAST based alignments within the context of CHAIN analysis of three protein families: G<sub>i<it>α </it></sub>subunits, prolyl oligopeptidases, and transitional endoplasmic reticulum (p97) AAA+ ATPases.</p> <p>Conclusion</p> <p>While not entirely replacing PSI-BLAST based alignments, which likewise may be optimized for CHAIN analysis using this approach, these motif-based methods often more accurately align very distantly related sequences and thus can provide a better measure of selective constraints. In some instances, these new approaches also provide a better understanding of family-specific constraints, as we illustrate for p97 ATPases. Programs implementing these procedures and supplementary information are available from the authors.</p> http://www.biomedcentral.com/1471-2105/5/157
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Liu Jun S Neuwald Andrew F
spellingShingle	Liu Jun S Neuwald Andrew F Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model BMC Bioinformatics
author_facet	Liu Jun S Neuwald Andrew F
author_sort	Liu Jun S
title	Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model
title_short	Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model
title_full	Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model
title_fullStr	Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model
title_full_unstemmed	Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model
title_sort	gapped alignment of protein sequence motifs through monte carlo optimization of a hidden markov model
publisher	BMC
series	BMC Bioinformatics
issn	1471-2105
publishDate	2004-10-01
description	<p>Abstract</p> <p>Background</p> <p>Certain protein families are highly conserved across distantly related organisms and belong to large and functionally diverse superfamilies. The patterns of conservation present in these protein sequences presumably are due to selective constraints maintaining important but unknown structural mechanisms with some constraints specific to each family and others shared by a larger subset or by the entire superfamily. To exploit these patterns as a source of functional information, we recently devised a statistically based approach called contrast hierarchical alignment and interaction network (CHAIN) analysis, which infers the strengths of various categories of selective constraints from co-conserved patterns in a multiple alignment. The power of this approach strongly depends on the quality of the multiple alignments, which thus motivated development of theoretical concepts and strategies to improve alignment of conserved motifs within large sets of distantly related sequences.</p> <p>Results</p> <p>Here we describe a hidden Markov model (HMM), an algebraic system, and Markov chain Monte Carlo (MCMC) sampling strategies for alignment of multiple sequence motifs. The MCMC sampling strategies are useful both for alignment optimization and for adjusting position specific background amino acid frequencies for alignment uncertainties. Associated statistical formulations provide an objective measure of alignment quality as well as automatic gap penalty optimization. Improved alignments obtained in this way are compared with PSI-BLAST based alignments within the context of CHAIN analysis of three protein families: G<sub>i<it>α </it></sub>subunits, prolyl oligopeptidases, and transitional endoplasmic reticulum (p97) AAA+ ATPases.</p> <p>Conclusion</p> <p>While not entirely replacing PSI-BLAST based alignments, which likewise may be optimized for CHAIN analysis using this approach, these motif-based methods often more accurately align very distantly related sequences and thus can provide a better measure of selective constraints. In some instances, these new approaches also provide a better understanding of family-specific constraints, as we illustrate for p97 ATPases. Programs implementing these procedures and supplementary information are available from the authors.</p>
url	http://www.biomedcentral.com/1471-2105/5/157
work_keys_str_mv	AT liujuns gappedalignmentofproteinsequencemotifsthroughmontecarlooptimizationofahiddenmarkovmodel AT neuwaldandrewf gappedalignmentofproteinsequencemotifsthroughmontecarlooptimizationofahiddenmarkovmodel
_version_	1725589244802498560

Gapped alignment of protein sequence motifs through Monte Carlo optimization of a hidden Markov model

Similar Items