Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.

BACKGROUND:HEAT and ARM repeats occur in a large number of eukaryotic proteins. As these repeats are often highly diverged, the prediction of HEAT or ARM domains can be challenging. Except for the most clear-cut cases, identification at the individual repeat level is indispensable, in particular for...

Full description

Bibliographic Details
Main Authors: Fred Kippert, Dietlind L Gerloff
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2009-09-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC2744927?pdf=render
id doaj-7b9f5cf497944336b1b0901bf054e6ba
record_format Article
spelling doaj-7b9f5cf497944336b1b0901bf054e6ba2020-11-24T21:40:46ZengPublic Library of Science (PLoS)PLoS ONE1932-62032009-09-0149e714810.1371/journal.pone.0007148Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.Fred KippertDietlind L GerloffBACKGROUND:HEAT and ARM repeats occur in a large number of eukaryotic proteins. As these repeats are often highly diverged, the prediction of HEAT or ARM domains can be challenging. Except for the most clear-cut cases, identification at the individual repeat level is indispensable, in particular for determining domain boundaries. However, methods using single sequence queries do not have the sensitivity required to deal with more divergent repeats and, when applied to proteins with known structures, in some cases failed to detect a single repeat. METHODOLOGY AND PRINCIPAL FINDINGS:Testing algorithms which use multiple sequence alignments as queries, we found two of them, HHpred and COACH, to detect HEAT and ARM repeats with greatly enhanced sensitivity. Calibration against experimentally determined structures suggests the use of three score classes with increasing confidence in the prediction, and prediction thresholds for each method. When we applied a new protocol using both HHpred and COACH to these structures, it detected 82% of HEAT repeats and 90% of ARM repeats, with the minimum for a given protein of 57% for HEAT repeats and 60% for ARM repeats. Application to bona fide HEAT and ARM proteins or domains indicated that similar numbers can be expected for the full complement of HEAT/ARM proteins. A systematic screen of the Protein Data Bank for false positive hits revealed their number to be low, in particular for ARM repeats. Double false positive hits for a given protein were rare for HEAT and not at all observed for ARM repeats. In combination with fold prediction and consistency checking (multiple sequence alignments, secondary structure prediction, and position analysis), repeat prediction with the new HHpred/COACH protocol dramatically improves prediction in the twilight zone of fold prediction methods, as well as the delineation of HEAT/ARM domain boundaries. SIGNIFICANCE:A protocol is presented for the identification of individual HEAT or ARM repeats which is straightforward to implement. It provides high sensitivity at a low false positive rate and will therefore greatly enhance the accuracy of predictions of HEAT and ARM domains.http://europepmc.org/articles/PMC2744927?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Fred Kippert
Dietlind L Gerloff
spellingShingle Fred Kippert
Dietlind L Gerloff
Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.
PLoS ONE
author_facet Fred Kippert
Dietlind L Gerloff
author_sort Fred Kippert
title Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.
title_short Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.
title_full Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.
title_fullStr Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.
title_full_unstemmed Highly sensitive detection of individual HEAT and ARM repeats with HHpred and COACH.
title_sort highly sensitive detection of individual heat and arm repeats with hhpred and coach.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2009-09-01
description BACKGROUND:HEAT and ARM repeats occur in a large number of eukaryotic proteins. As these repeats are often highly diverged, the prediction of HEAT or ARM domains can be challenging. Except for the most clear-cut cases, identification at the individual repeat level is indispensable, in particular for determining domain boundaries. However, methods using single sequence queries do not have the sensitivity required to deal with more divergent repeats and, when applied to proteins with known structures, in some cases failed to detect a single repeat. METHODOLOGY AND PRINCIPAL FINDINGS:Testing algorithms which use multiple sequence alignments as queries, we found two of them, HHpred and COACH, to detect HEAT and ARM repeats with greatly enhanced sensitivity. Calibration against experimentally determined structures suggests the use of three score classes with increasing confidence in the prediction, and prediction thresholds for each method. When we applied a new protocol using both HHpred and COACH to these structures, it detected 82% of HEAT repeats and 90% of ARM repeats, with the minimum for a given protein of 57% for HEAT repeats and 60% for ARM repeats. Application to bona fide HEAT and ARM proteins or domains indicated that similar numbers can be expected for the full complement of HEAT/ARM proteins. A systematic screen of the Protein Data Bank for false positive hits revealed their number to be low, in particular for ARM repeats. Double false positive hits for a given protein were rare for HEAT and not at all observed for ARM repeats. In combination with fold prediction and consistency checking (multiple sequence alignments, secondary structure prediction, and position analysis), repeat prediction with the new HHpred/COACH protocol dramatically improves prediction in the twilight zone of fold prediction methods, as well as the delineation of HEAT/ARM domain boundaries. SIGNIFICANCE:A protocol is presented for the identification of individual HEAT or ARM repeats which is straightforward to implement. It provides high sensitivity at a low false positive rate and will therefore greatly enhance the accuracy of predictions of HEAT and ARM domains.
url http://europepmc.org/articles/PMC2744927?pdf=render
work_keys_str_mv AT fredkippert highlysensitivedetectionofindividualheatandarmrepeatswithhhpredandcoach
AT dietlindlgerloff highlysensitivedetectionofindividualheatandarmrepeatswithhhpredandcoach
_version_ 1725924658045255680