Balrog: A universal protein model for prokaryotic gene prediction.

Low-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retraine...

Full description

Bibliographic Details
Main Authors: Markus J Sommer, Steven L Salzberg
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2021-02-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1008727
id doaj-9d327ac236084e7f9f9698f363ae0d80
record_format Article
spelling doaj-9d327ac236084e7f9f9698f363ae0d802021-07-09T04:31:56ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582021-02-01172e100872710.1371/journal.pcbi.1008727Balrog: A universal protein model for prokaryotic gene prediction.Markus J SommerSteven L SalzbergLow-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retrained on each new genome. We have developed a universal model of prokaryotic genes by fitting a temporal convolutional network to amino-acid sequences from a large, diverse set of microbial genomes. We incorporated the new model into a gene finding system, Balrog (Bacterial Annotation by Learned Representation Of Genes), which does not require genome-specific training and which matches or outperforms other state-of-the-art gene finding tools. Balrog is freely available under the MIT license at https://github.com/salzberg-lab/Balrog.https://doi.org/10.1371/journal.pcbi.1008727
collection DOAJ
language English
format Article
sources DOAJ
author Markus J Sommer
Steven L Salzberg
spellingShingle Markus J Sommer
Steven L Salzberg
Balrog: A universal protein model for prokaryotic gene prediction.
PLoS Computational Biology
author_facet Markus J Sommer
Steven L Salzberg
author_sort Markus J Sommer
title Balrog: A universal protein model for prokaryotic gene prediction.
title_short Balrog: A universal protein model for prokaryotic gene prediction.
title_full Balrog: A universal protein model for prokaryotic gene prediction.
title_fullStr Balrog: A universal protein model for prokaryotic gene prediction.
title_full_unstemmed Balrog: A universal protein model for prokaryotic gene prediction.
title_sort balrog: a universal protein model for prokaryotic gene prediction.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2021-02-01
description Low-cost, high-throughput sequencing has led to an enormous increase in the number of sequenced microbial genomes, with well over 100,000 genomes in public archives today. Automatic genome annotation tools are integral to understanding these organisms, yet older gene finding methods must be retrained on each new genome. We have developed a universal model of prokaryotic genes by fitting a temporal convolutional network to amino-acid sequences from a large, diverse set of microbial genomes. We incorporated the new model into a gene finding system, Balrog (Bacterial Annotation by Learned Representation Of Genes), which does not require genome-specific training and which matches or outperforms other state-of-the-art gene finding tools. Balrog is freely available under the MIT license at https://github.com/salzberg-lab/Balrog.
url https://doi.org/10.1371/journal.pcbi.1008727
work_keys_str_mv AT markusjsommer balrogauniversalproteinmodelforprokaryoticgeneprediction
AT stevenlsalzberg balrogauniversalproteinmodelforprokaryoticgeneprediction
_version_ 1721312304873603072