Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.
BACKGROUND: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2010-01-01
|
Series: | PLoS ONE |
Online Access: | http://europepmc.org/articles/PMC2893159?pdf=render |
id |
doaj-95f97a3c895c47f5b895f7a3de6ea6d6 |
---|---|
record_format |
Article |
spelling |
doaj-95f97a3c895c47f5b895f7a3de6ea6d62020-11-25T01:12:46ZengPublic Library of Science (PLoS)PLoS ONE1932-62032010-01-0156e1108210.1371/journal.pone.0011082Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation.Russell J DicksonLindi M WahlAndrew D FernandesGregory B GloorBACKGROUND: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. METHODOLOGY/PRINCIPAL FINDINGS: We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. CONCLUSIONS/SIGNIFICANCE: Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation.http://europepmc.org/articles/PMC2893159?pdf=render |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Russell J Dickson Lindi M Wahl Andrew D Fernandes Gregory B Gloor |
spellingShingle |
Russell J Dickson Lindi M Wahl Andrew D Fernandes Gregory B Gloor Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. PLoS ONE |
author_facet |
Russell J Dickson Lindi M Wahl Andrew D Fernandes Gregory B Gloor |
author_sort |
Russell J Dickson |
title |
Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. |
title_short |
Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. |
title_full |
Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. |
title_fullStr |
Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. |
title_full_unstemmed |
Identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. |
title_sort |
identifying and seeing beyond multiple sequence alignment errors using intra-molecular protein covariation. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS ONE |
issn |
1932-6203 |
publishDate |
2010-01-01 |
description |
BACKGROUND: There is currently no way to verify the quality of a multiple sequence alignment that is independent of the assumptions used to build it. Sequence alignments are typically evaluated by a number of established criteria: sequence conservation, the number of aligned residues, the frequency of gaps, and the probable correct gap placement. Covariation analysis is used to find putatively important residue pairs in a sequence alignment. Different alignments of the same protein family give different results demonstrating that covariation depends on the quality of the sequence alignment. We thus hypothesized that current criteria are insufficient to build alignments for use with covariation analyses. METHODOLOGY/PRINCIPAL FINDINGS: We show that current criteria are insufficient to build alignments for use with covariation analyses as systematic sequence alignment errors are present even in hand-curated structure-based alignment datasets like those from the Conserved Domain Database. We show that current non-parametric covariation statistics are sensitive to sequence misalignments and that this sensitivity can be used to identify systematic alignment errors. We demonstrate that removing alignment errors due to 1) improper structure alignment, 2) the presence of paralogous sequences, and 3) partial or otherwise erroneous sequences, improves contact prediction by covariation analysis. Finally we describe two non-parametric covariation statistics that are less sensitive to sequence alignment errors than those described previously in the literature. CONCLUSIONS/SIGNIFICANCE: Protein alignments with errors lead to false positive and false negative conclusions (incorrect assignment of covariation and conservation, respectively). Covariation analysis can provide a verification step, independent of traditional criteria, to identify systematic misalignments in protein alignments. Two non-parametric statistics are shown to be somewhat insensitive to misalignment errors, providing increased confidence in contact prediction when analyzing alignments with erroneous regions because of an emphasis on they emphasize pairwise covariation over group covariation. |
url |
http://europepmc.org/articles/PMC2893159?pdf=render |
work_keys_str_mv |
AT russelljdickson identifyingandseeingbeyondmultiplesequencealignmenterrorsusingintramolecularproteincovariation AT lindimwahl identifyingandseeingbeyondmultiplesequencealignmenterrorsusingintramolecularproteincovariation AT andrewdfernandes identifyingandseeingbeyondmultiplesequencealignmenterrorsusingintramolecularproteincovariation AT gregorybgloor identifyingandseeingbeyondmultiplesequencealignmenterrorsusingintramolecularproteincovariation |
_version_ |
1725165096493121536 |