Informative Regions In Viral Genomes

Viruses, far from being just parasites affecting hosts’ fitness, are major players in any microbial ecosystem. In spite of their broad abundance, viruses, in particular bacteriophages, remain largely unknown since only about 20% of sequences obtained from viral community DNA surveys could be annotat...

Full description

Bibliographic Details
Main Authors: Jaime Leonardo Moreno-Gallego, Alejandro Reyes
Format: Article
Language:English
Published: MDPI AG 2021-06-01
Series:Viruses
Subjects:
Online Access:https://www.mdpi.com/1999-4915/13/6/1164
id doaj-9099600229fa4c6ba80a31c4a244dc34
record_format Article
spelling doaj-9099600229fa4c6ba80a31c4a244dc342021-07-01T00:29:14ZengMDPI AGViruses1999-49152021-06-01131164116410.3390/v13061164Informative Regions In Viral GenomesJaime Leonardo Moreno-Gallego0Alejandro Reyes1Department of Microbiome Science, Max Planck Institute for Developmental Biology, 72076 Tübingen, GermanyMax Planck Tandem Group in Computational Biology, Department of Biological Sciences, Universidad de los Andes, Bogotá 111711, ColombiaViruses, far from being just parasites affecting hosts’ fitness, are major players in any microbial ecosystem. In spite of their broad abundance, viruses, in particular bacteriophages, remain largely unknown since only about 20% of sequences obtained from viral community DNA surveys could be annotated by comparison with public databases. In order to shed some light into this genetic dark matter we expanded the search of orthologous groups as potential markers to viral taxonomy from bacteriophages and included eukaryotic viruses, establishing a set of 31,150 ViPhOGs (Eukaryotic Viruses and Phages Orthologous Groups). To do this, we examine the non-redundant viral diversity stored in public databases, predict proteins in genomes lacking such information, and used all annotated and predicted proteins to identify potential protein domains. The clustering of domains and unannotated regions into orthologous groups was done using cogSoft. Finally, we employed a random forest implementation to classify genomes into their taxonomy and found that the presence or absence of ViPhOGs is significantly associated with their taxonomy. Furthermore, we established a set of 1457 ViPhOGs that given their importance for the classification could be considered as markers or signatures for the different taxonomic groups defined by the ICTV at the order, family, and genus levels.https://www.mdpi.com/1999-4915/13/6/1164eukaryotic virusesphagesorthologous gropusrandom forestViPhOGs
collection DOAJ
language English
format Article
sources DOAJ
author Jaime Leonardo Moreno-Gallego
Alejandro Reyes
spellingShingle Jaime Leonardo Moreno-Gallego
Alejandro Reyes
Informative Regions In Viral Genomes
Viruses
eukaryotic viruses
phages
orthologous gropus
random forest
ViPhOGs
author_facet Jaime Leonardo Moreno-Gallego
Alejandro Reyes
author_sort Jaime Leonardo Moreno-Gallego
title Informative Regions In Viral Genomes
title_short Informative Regions In Viral Genomes
title_full Informative Regions In Viral Genomes
title_fullStr Informative Regions In Viral Genomes
title_full_unstemmed Informative Regions In Viral Genomes
title_sort informative regions in viral genomes
publisher MDPI AG
series Viruses
issn 1999-4915
publishDate 2021-06-01
description Viruses, far from being just parasites affecting hosts’ fitness, are major players in any microbial ecosystem. In spite of their broad abundance, viruses, in particular bacteriophages, remain largely unknown since only about 20% of sequences obtained from viral community DNA surveys could be annotated by comparison with public databases. In order to shed some light into this genetic dark matter we expanded the search of orthologous groups as potential markers to viral taxonomy from bacteriophages and included eukaryotic viruses, establishing a set of 31,150 ViPhOGs (Eukaryotic Viruses and Phages Orthologous Groups). To do this, we examine the non-redundant viral diversity stored in public databases, predict proteins in genomes lacking such information, and used all annotated and predicted proteins to identify potential protein domains. The clustering of domains and unannotated regions into orthologous groups was done using cogSoft. Finally, we employed a random forest implementation to classify genomes into their taxonomy and found that the presence or absence of ViPhOGs is significantly associated with their taxonomy. Furthermore, we established a set of 1457 ViPhOGs that given their importance for the classification could be considered as markers or signatures for the different taxonomic groups defined by the ICTV at the order, family, and genus levels.
topic eukaryotic viruses
phages
orthologous gropus
random forest
ViPhOGs
url https://www.mdpi.com/1999-4915/13/6/1164
work_keys_str_mv AT jaimeleonardomorenogallego informativeregionsinviralgenomes
AT alejandroreyes informativeregionsinviralgenomes
_version_ 1721348515032989696