Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.

BACKGROUND: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency...

Full description

Bibliographic Details
Main Authors: Russell J Dickson, Lindi M Wahl, Andrew D Fernandes, Gregory B Gloor
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2010-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2893159?pdf=render
id doaj-95f97a3c895c47f5b895f7a3de6ea6d6
record_format Article
spelling doaj-95f97a3c895c47f5b895f7a3de6ea6d62020-11-25T01:12:46ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-01-0156e1108210.1371/journal.pone.0011082Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.Russell J DicksonLindi M WahlAndrew D FernandesGregory B GloorBACKGROUND: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. METHODOLOGY/PRINCIPAL FINDINGS: We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. CONCLUSIONS/SIGNIFICANCE: Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.http://europepmc.org/articles/PMC2893159?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Russell J Dickson
Lindi M Wahl
Andrew D Fernandes
Gregory B Gloor
spellingShingle Russell J Dickson
Lindi M Wahl
Andrew D Fernandes
Gregory B Gloor
Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.
PLoS ONE
author_facet Russell J Dickson
Lindi M Wahl
Andrew D Fernandes
Gregory B Gloor
author_sort Russell J Dickson
title Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.
title_short Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.
title_full Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.
title_fullStr Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.
title_full_unstemmed Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.
title_sort identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2010-01-01
description BACKGROUND: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. METHODOLOGY/PRINCIPAL FINDINGS: We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. CONCLUSIONS/SIGNIFICANCE: Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.
url http://europepmc.org/articles/PMC2893159?pdf=render
work_keys_str_mv AT russelljdickson identifyingandseeingbeyondmultiplesequencealignmenterrorsusingintramolecularproteincovariation
AT lindimwahl identifyingandseeingbeyondmultiplesequencealignmenterrorsusingintramolecularproteincovariation
AT andrewdfernandes identifyingandseeingbeyondmultiplesequencealignmenterrorsusingintramolecularproteincovariation
AT gregorybgloor identifyingandseeingbeyondmultiplesequencealignmenterrorsusingintramolecularproteincovariation
_version_ 1725165096493121536