Common and phylogenetically widespread coding for peptides by bacterial small RNAs

Abstract Background While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (O...

Full description

Bibliographic Details
Main Authors: Robin C. Friedman, Stefan Kalkhof, Olivia Doppelt-Azeroual, Stephan A. Mueller, Martina Chovancová, Martin von Bergen, Benno Schwikowski
Format: Article
Language:English
Published: BMC 2017-07-01
Series:BMC Genomics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12864-017-3932-y
id doaj-85e1df008b3d4322be46ef83c2a04677
record_format Article
spelling doaj-85e1df008b3d4322be46ef83c2a046772020-11-24T21:57:40ZengBMCBMC Genomics1471-21642017-07-0118112110.1186/s12864-017-3932-yCommon and phylogenetically widespread coding for peptides by bacterial small RNAsRobin C. Friedman0Stefan Kalkhof1Olivia Doppelt-Azeroual2Stephan A. Mueller3Martina Chovancová4Martin von Bergen5Benno Schwikowski6Systems Biology Laboratory, Department of Genomes and GeneticsDepartment of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZBioinformatics and Biostatistics HubDepartment of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZDepartment of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZDepartment of Molecular Systems Biology, Helmholtz Centre for Environmental Research - UFZSystems Biology Laboratory, Department of Genomes and GeneticsAbstract Background While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. Methods Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in 14 phylogenetically diverse bacteria. Importantly, we quantify uncertainty in our predictions, and follow up on them using mass spectrometry proteomics and comparison to datasets including ribosome profiling. Results A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 409±191.7 unannotated sRNA ORFs are under selection to maintain coding (mean estimate and 95% confidence interval), an average of 29 per species considered here. This implies that overall at least 10.3±0.5% of sRNAs have a coding ORF, and in some species around 20% do. 165±69 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated in published ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are predicted novel components of type I toxin/antitoxin systems. Conclusions We predict over two dozen new protein-coding genes per bacterial species, but crucially also quantified the uncertainty in this estimate. Our predictions for sRNA coding ORFs, along with predicted novel type I toxins and tools for sorting and visualizing genomic context, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr. We expect these easily-accessible predictions to be a valuable tool for the study not only of bacterial sRNAs and type I toxin-antitoxin systems, but also of bacterial genetics and genomics.http://link.springer.com/article/10.1186/s12864-017-3932-ysRNAsType I toxin/antitoxinShort ORFsMachine learningRibosome profilingMass spectrometry
collection DOAJ
language English
format Article
sources DOAJ
author Robin C. Friedman
Stefan Kalkhof
Olivia Doppelt-Azeroual
Stephan A. Mueller
Martina Chovancová
Martin von Bergen
Benno Schwikowski
spellingShingle Robin C. Friedman
Stefan Kalkhof
Olivia Doppelt-Azeroual
Stephan A. Mueller
Martina Chovancová
Martin von Bergen
Benno Schwikowski
Common and phylogenetically widespread coding for peptides by bacterial small RNAs
BMC Genomics
sRNAs
Type I toxin/antitoxin
Short ORFs
Machine learning
Ribosome profiling
Mass spectrometry
author_facet Robin C. Friedman
Stefan Kalkhof
Olivia Doppelt-Azeroual
Stephan A. Mueller
Martina Chovancová
Martin von Bergen
Benno Schwikowski
author_sort Robin C. Friedman
title Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title_short Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title_full Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title_fullStr Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title_full_unstemmed Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title_sort common and phylogenetically widespread coding for peptides by bacterial small rnas
publisher BMC
series BMC Genomics
issn 1471-2164
publishDate 2017-07-01
description Abstract Background While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. Methods Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in 14 phylogenetically diverse bacteria. Importantly, we quantify uncertainty in our predictions, and follow up on them using mass spectrometry proteomics and comparison to datasets including ribosome profiling. Results A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 409±191.7 unannotated sRNA ORFs are under selection to maintain coding (mean estimate and 95% confidence interval), an average of 29 per species considered here. This implies that overall at least 10.3±0.5% of sRNAs have a coding ORF, and in some species around 20% do. 165±69 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated in published ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are predicted novel components of type I toxin/antitoxin systems. Conclusions We predict over two dozen new protein-coding genes per bacterial species, but crucially also quantified the uncertainty in this estimate. Our predictions for sRNA coding ORFs, along with predicted novel type I toxins and tools for sorting and visualizing genomic context, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr. We expect these easily-accessible predictions to be a valuable tool for the study not only of bacterial sRNAs and type I toxin-antitoxin systems, but also of bacterial genetics and genomics.
topic sRNAs
Type I toxin/antitoxin
Short ORFs
Machine learning
Ribosome profiling
Mass spectrometry
url http://link.springer.com/article/10.1186/s12864-017-3932-y
work_keys_str_mv AT robincfriedman commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT stefankalkhof commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT oliviadoppeltazeroual commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT stephanamueller commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT martinachovancova commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT martinvonbergen commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT bennoschwikowski commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
_version_ 1725854218078650368