Enhanced protein domain discovery using taxonomy

<p>Abstract</p> <p>Background</p> <p>It is well known that different species have different protein domain repertoires, and indeed that some protein domains are kingdom specific. This information has not yet been incorporated into statistical methods for finding domains...

Full description

Bibliographic Details
Main Authors: Coin Lachlan, Bateman Alex, Durbin Richard
Format: Article
Language:English
Published: BMC 2004-05-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/5/56
id doaj-9ae0e2bf0ea0415fa7d1439ee4b04677
record_format Article
spelling doaj-9ae0e2bf0ea0415fa7d1439ee4b046772020-11-25T01:29:38ZengBMCBMC Bioinformatics1471-21052004-05-01515610.1186/1471-2105-5-56Enhanced protein domain discovery using taxonomyCoin LachlanBateman AlexDurbin Richard<p>Abstract</p> <p>Background</p> <p>It is well known that different species have different protein domain repertoires, and indeed that some protein domains are kingdom specific. This information has not yet been incorporated into statistical methods for finding domains in sequences of amino acids.</p> <p>Results</p> <p>We show that by incorporating our understanding of the taxonomic distribution of specific protein domains, we can enhance domain recognition in protein sequences. We identify 4447 new instances of Pfam domains in the SP-TREMBL database using this technique, equivalent to the coverage increase given by the last 8.3% of Pfam families and to a 0.7% increase in the number of domain predictions. We use PSI-BLAST to cross-validate our new predictions. We also benchmark our approach using a SCOP test set of proteins of known structure, and demonstrate improvements relative to standard Hidden Markov model techniques.</p> <p>Conclusions</p> <p>Explicitly including knowledge about the taxonomic distribution of protein domains can enhance protein domain recognition. Our method can also incorporate other context-specific domain distributions – such as domain co-occurrence and protein localisation.</p> http://www.biomedcentral.com/1471-2105/5/56
collection DOAJ
language English
format Article
sources DOAJ
author Coin Lachlan
Bateman Alex
Durbin Richard
spellingShingle Coin Lachlan
Bateman Alex
Durbin Richard
Enhanced protein domain discovery using taxonomy
BMC Bioinformatics
author_facet Coin Lachlan
Bateman Alex
Durbin Richard
author_sort Coin Lachlan
title Enhanced protein domain discovery using taxonomy
title_short Enhanced protein domain discovery using taxonomy
title_full Enhanced protein domain discovery using taxonomy
title_fullStr Enhanced protein domain discovery using taxonomy
title_full_unstemmed Enhanced protein domain discovery using taxonomy
title_sort enhanced protein domain discovery using taxonomy
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2004-05-01
description <p>Abstract</p> <p>Background</p> <p>It is well known that different species have different protein domain repertoires, and indeed that some protein domains are kingdom specific. This information has not yet been incorporated into statistical methods for finding domains in sequences of amino acids.</p> <p>Results</p> <p>We show that by incorporating our understanding of the taxonomic distribution of specific protein domains, we can enhance domain recognition in protein sequences. We identify 4447 new instances of Pfam domains in the SP-TREMBL database using this technique, equivalent to the coverage increase given by the last 8.3% of Pfam families and to a 0.7% increase in the number of domain predictions. We use PSI-BLAST to cross-validate our new predictions. We also benchmark our approach using a SCOP test set of proteins of known structure, and demonstrate improvements relative to standard Hidden Markov model techniques.</p> <p>Conclusions</p> <p>Explicitly including knowledge about the taxonomic distribution of protein domains can enhance protein domain recognition. Our method can also incorporate other context-specific domain distributions – such as domain co-occurrence and protein localisation.</p>
url http://www.biomedcentral.com/1471-2105/5/56
work_keys_str_mv AT coinlachlan enhancedproteindomaindiscoveryusingtaxonomy
AT batemanalex enhancedproteindomaindiscoveryusingtaxonomy
AT durbinrichard enhancedproteindomaindiscoveryusingtaxonomy
_version_ 1725095877261918208