Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes.

BACKGROUND: Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly...

Full description

Bibliographic Details
Main Authors: Gareth A Wilson, Edward J Feil, Andrew K Lilley, Dawn Field
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2007-01-01
Series:PLoS ONE
Online Access:http://europepmc.org/articles/PMC1824705?pdf=render
id doaj-889d46c353584b81919e3f8401dd7c95
record_format Article
spelling doaj-889d46c353584b81919e3f8401dd7c952020-11-25T02:19:49ZengPublic Library of Science (PLoS)PLoS ONE1932-62032007-01-0123e32410.1371/journal.pone.0000324Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes.Gareth A WilsonEdward J FeilAndrew K LilleyDawn FieldBACKGROUND: Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein. METHODOLOGY/PRINCIPAL FINDINGS: We have developed "QIPP" ("Quality Index for Predicted Proteins"), an index that scores the "quality" of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores, and identifies many high-scoring orphans as potentially "authentic" (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores. CONCLUSIONS: The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine.http://europepmc.org/articles/PMC1824705?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Gareth A Wilson
Edward J Feil
Andrew K Lilley
Dawn Field
spellingShingle Gareth A Wilson
Edward J Feil
Andrew K Lilley
Dawn Field
Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes.
PLoS ONE
author_facet Gareth A Wilson
Edward J Feil
Andrew K Lilley
Dawn Field
author_sort Gareth A Wilson
title Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes.
title_short Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes.
title_full Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes.
title_fullStr Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes.
title_full_unstemmed Large-scale comparative genomic ranking of taxonomically restricted genes (TRGs) in bacterial and archaeal genomes.
title_sort large-scale comparative genomic ranking of taxonomically restricted genes (trgs) in bacterial and archaeal genomes.
publisher Public Library of Science (PLoS)
series PLoS ONE
issn 1932-6203
publishDate 2007-01-01
description BACKGROUND: Lineage-specific, or taxonomically restricted genes (TRGs), especially those that are species and strain-specific, are of special interest because they are expected to play a role in defining exclusive ecological adaptations to particular niches. Despite this, they are relatively poorly studied and little understood, in large part because many are still orphans or only have homologues in very closely related isolates. This lack of homology confounds attempts to establish the likelihood that a hypothetical gene is expressed and, if so, to determine the putative function of the protein. METHODOLOGY/PRINCIPAL FINDINGS: We have developed "QIPP" ("Quality Index for Predicted Proteins"), an index that scores the "quality" of a protein based on non-homology-based criteria. QIPP can be used to assign a value between zero and one to any protein based on comparing its features to other proteins in a given genome. We have used QIPP to rank the predicted proteins in the proteomes of Bacteria and Archaea. This ranking reveals that there is a large amount of variation in QIPP scores, and identifies many high-scoring orphans as potentially "authentic" (expressed) orphans. There are significant differences in the distributions of QIPP scores between orphan and non-orphan genes for many genomes and a trend for less well-conserved genes to have lower QIPP scores. CONCLUSIONS: The implication of this work is that QIPP scores can be used to further annotate predicted proteins with information that is independent of homology. Such information can be used to prioritize candidates for further analysis. Data generated for this study can be found in the OrphanMine at http://www.genomics.ceh.ac.uk/orphan_mine.
url http://europepmc.org/articles/PMC1824705?pdf=render
work_keys_str_mv AT garethawilson largescalecomparativegenomicrankingoftaxonomicallyrestrictedgenestrgsinbacterialandarchaealgenomes
AT edwardjfeil largescalecomparativegenomicrankingoftaxonomicallyrestrictedgenestrgsinbacterialandarchaealgenomes
AT andrewklilley largescalecomparativegenomicrankingoftaxonomicallyrestrictedgenestrgsinbacterialandarchaealgenomes
AT dawnfield largescalecomparativegenomicrankingoftaxonomicallyrestrictedgenestrgsinbacterialandarchaealgenomes
_version_ 1724874118899171328