The COG database: an updated version includes eukaryotes

<p>Abstract</p> <p>Background</p> <p>The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a...

Full description

Bibliographic Details
Main Authors: Sverdlov Alexander V, Smirnov Sergei, Rao B Sridhar, Nikolskaya Anastasia N, Mekhedov Sergei L, Mazumder Raja, Krylov Dmitri M, Koonin Eugene V, Kiryutin Boris, Jacobs Aviva R, Jackson John D, Fedorova Natalie D, Tatusov Roman L, Vasudevan Sona, Wolf Yuri I, Yin Jodie J, Natale Darren A
Format: Article
Language:English
Published: BMC 2003-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/4/41
id doaj-01c4c2dfb29244aebfb3a629f4ca02f0
record_format Article
spelling doaj-01c4c2dfb29244aebfb3a629f4ca02f02020-11-25T00:45:01ZengBMCBMC Bioinformatics1471-21052003-09-01414110.1186/1471-2105-4-41The COG database: an updated version includes eukaryotesSverdlov Alexander VSmirnov SergeiRao B SridharNikolskaya Anastasia NMekhedov Sergei LMazumder RajaKrylov Dmitri MKoonin Eugene VKiryutin BorisJacobs Aviva RJackson John DFedorova Natalie DTatusov Roman LVasudevan SonaWolf Yuri IYin Jodie JNatale Darren A<p>Abstract</p> <p>Background</p> <p>The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies.</p> <p>Results</p> <p>We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode <it>Caenorhabditis elegans</it>, the fruit fly <it>Drosophila melanogaster </it>and <it>Homo sapiens</it>), one plant, <it>Arabidopsis thaliana</it>, two fungi (<it>Saccharomyces cerevisiae </it>and <it>Schizosaccharomyces pombe</it>), and the intracellular microsporidian parasite <it>Encephalitozoon cuniculi</it>. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.</p> <p>Conclusion</p> <p>The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.</p> http://www.biomedcentral.com/1471-2105/4/41
collection DOAJ
language English
format Article
sources DOAJ
author Sverdlov Alexander V
Smirnov Sergei
Rao B Sridhar
Nikolskaya Anastasia N
Mekhedov Sergei L
Mazumder Raja
Krylov Dmitri M
Koonin Eugene V
Kiryutin Boris
Jacobs Aviva R
Jackson John D
Fedorova Natalie D
Tatusov Roman L
Vasudevan Sona
Wolf Yuri I
Yin Jodie J
Natale Darren A
spellingShingle Sverdlov Alexander V
Smirnov Sergei
Rao B Sridhar
Nikolskaya Anastasia N
Mekhedov Sergei L
Mazumder Raja
Krylov Dmitri M
Koonin Eugene V
Kiryutin Boris
Jacobs Aviva R
Jackson John D
Fedorova Natalie D
Tatusov Roman L
Vasudevan Sona
Wolf Yuri I
Yin Jodie J
Natale Darren A
The COG database: an updated version includes eukaryotes
BMC Bioinformatics
author_facet Sverdlov Alexander V
Smirnov Sergei
Rao B Sridhar
Nikolskaya Anastasia N
Mekhedov Sergei L
Mazumder Raja
Krylov Dmitri M
Koonin Eugene V
Kiryutin Boris
Jacobs Aviva R
Jackson John D
Fedorova Natalie D
Tatusov Roman L
Vasudevan Sona
Wolf Yuri I
Yin Jodie J
Natale Darren A
author_sort Sverdlov Alexander V
title The COG database: an updated version includes eukaryotes
title_short The COG database: an updated version includes eukaryotes
title_full The COG database: an updated version includes eukaryotes
title_fullStr The COG database: an updated version includes eukaryotes
title_full_unstemmed The COG database: an updated version includes eukaryotes
title_sort cog database: an updated version includes eukaryotes
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2003-09-01
description <p>Abstract</p> <p>Background</p> <p>The availability of multiple, essentially complete genome sequences of prokaryotes and eukaryotes spurred both the demand and the opportunity for the construction of an evolutionary classification of genes from these genomes. Such a classification system based on orthologous relationships between genes appears to be a natural framework for comparative genomics and should facilitate both functional annotation of genomes and large-scale evolutionary studies.</p> <p>Results</p> <p>We describe here a major update of the previously developed system for delineation of Clusters of Orthologous Groups of proteins (COGs) from the sequenced genomes of prokaryotes and unicellular eukaryotes and the construction of clusters of predicted orthologs for 7 eukaryotic genomes, which we named KOGs after eukaryotic orthologous groups. The COG collection currently consists of 138,458 proteins, which form 4873 COGs and comprise 75% of the 185,505 (predicted) proteins encoded in 66 genomes of unicellular organisms. The eukaryotic orthologous groups (KOGs) include proteins from 7 eukaryotic genomes: three animals (the nematode <it>Caenorhabditis elegans</it>, the fruit fly <it>Drosophila melanogaster </it>and <it>Homo sapiens</it>), one plant, <it>Arabidopsis thaliana</it>, two fungi (<it>Saccharomyces cerevisiae </it>and <it>Schizosaccharomyces pombe</it>), and the intracellular microsporidian parasite <it>Encephalitozoon cuniculi</it>. The current KOG set consists of 4852 clusters of orthologs, which include 59,838 proteins, or ~54% of the analyzed eukaryotic 110,655 gene products. Compared to the coverage of the prokaryotic genomes with COGs, a considerably smaller fraction of eukaryotic genes could be included into the KOGs; addition of new eukaryotic genomes is expected to result in substantial increase in the coverage of eukaryotic genomes with KOGs. Examination of the phyletic patterns of KOGs reveals a conserved core represented in all analyzed species and consisting of ~20% of the KOG set. This conserved portion of the KOG set is much greater than the ubiquitous portion of the COG set (~1% of the COGs). In part, this difference is probably due to the small number of included eukaryotic genomes, but it could also reflect the relative compactness of eukaryotes as a clade and the greater evolutionary stability of eukaryotic genomes.</p> <p>Conclusion</p> <p>The updated collection of orthologous protein sets for prokaryotes and eukaryotes is expected to be a useful platform for functional annotation of newly sequenced genomes, including those of complex eukaryotes, and genome-wide evolutionary studies.</p>
url http://www.biomedcentral.com/1471-2105/4/41
work_keys_str_mv AT sverdlovalexanderv thecogdatabaseanupdatedversionincludeseukaryotes
AT smirnovsergei thecogdatabaseanupdatedversionincludeseukaryotes
AT raobsridhar thecogdatabaseanupdatedversionincludeseukaryotes
AT nikolskayaanastasian thecogdatabaseanupdatedversionincludeseukaryotes
AT mekhedovsergeil thecogdatabaseanupdatedversionincludeseukaryotes
AT mazumderraja thecogdatabaseanupdatedversionincludeseukaryotes
AT krylovdmitrim thecogdatabaseanupdatedversionincludeseukaryotes
AT koonineugenev thecogdatabaseanupdatedversionincludeseukaryotes
AT kiryutinboris thecogdatabaseanupdatedversionincludeseukaryotes
AT jacobsavivar thecogdatabaseanupdatedversionincludeseukaryotes
AT jacksonjohnd thecogdatabaseanupdatedversionincludeseukaryotes
AT fedorovanatalied thecogdatabaseanupdatedversionincludeseukaryotes
AT tatusovromanl thecogdatabaseanupdatedversionincludeseukaryotes
AT vasudevansona thecogdatabaseanupdatedversionincludeseukaryotes
AT wolfyurii thecogdatabaseanupdatedversionincludeseukaryotes
AT yinjodiej thecogdatabaseanupdatedversionincludeseukaryotes
AT nataledarrena thecogdatabaseanupdatedversionincludeseukaryotes
AT sverdlovalexanderv cogdatabaseanupdatedversionincludeseukaryotes
AT smirnovsergei cogdatabaseanupdatedversionincludeseukaryotes
AT raobsridhar cogdatabaseanupdatedversionincludeseukaryotes
AT nikolskayaanastasian cogdatabaseanupdatedversionincludeseukaryotes
AT mekhedovsergeil cogdatabaseanupdatedversionincludeseukaryotes
AT mazumderraja cogdatabaseanupdatedversionincludeseukaryotes
AT krylovdmitrim cogdatabaseanupdatedversionincludeseukaryotes
AT koonineugenev cogdatabaseanupdatedversionincludeseukaryotes
AT kiryutinboris cogdatabaseanupdatedversionincludeseukaryotes
AT jacobsavivar cogdatabaseanupdatedversionincludeseukaryotes
AT jacksonjohnd cogdatabaseanupdatedversionincludeseukaryotes
AT fedorovanatalied cogdatabaseanupdatedversionincludeseukaryotes
AT tatusovromanl cogdatabaseanupdatedversionincludeseukaryotes
AT vasudevansona cogdatabaseanupdatedversionincludeseukaryotes
AT wolfyurii cogdatabaseanupdatedversionincludeseukaryotes
AT yinjodiej cogdatabaseanupdatedversionincludeseukaryotes
AT nataledarrena cogdatabaseanupdatedversionincludeseukaryotes
_version_ 1725271848278556672