miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.

MicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to...

Full description

Bibliographic Details
Main Authors: Jimmy Bell, Maureen Larson, Michelle Kutzler, Massimo Bionaz, Christiane V Löhr, David Hendrix
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2019-10-01
Series:PLoS Computational Biology
Online Access:https://doi.org/10.1371/journal.pcbi.1007309
id doaj-e29e68a26dd54360b6871106e03b66e0
record_format Article
spelling doaj-e29e68a26dd54360b6871106e03b66e02021-04-21T15:38:17ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582019-10-011510e100730910.1371/journal.pcbi.1007309miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.Jimmy BellMaureen LarsonMichelle KutzlerMassimo BionazChristiane V LöhrDavid HendrixMicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to identifying valid miRs that have small numbers of reads, to properly locating hairpin precursors and to balancing precision and recall. Here, we present miRWoods, which solves these challenges using a duplex-focused precursor detection method and stacked random forests with specialized layers to detect mature and precursor microRNAs, and has been tuned to optimize the harmonic mean of precision and recall. We trained and tuned our discovery pipeline on data sets from the well-annotated human genome, and evaluated its performance on data from mouse. Compared to existing approaches, miRWoods better identifies precursor spans, and can balance sensitivity and specificity for an overall greater prediction accuracy, recalling an average of 10% more annotated microRNAs, and correctly predicts substantially more microRNAs with only one read. We apply this method to the under-annotated genomes of Felis catus (domestic cat) and Bos taurus (cow). We identified hundreds of novel microRNAs in small RNA sequencing data sets from muscle and skin from cat, from 10 tissues from cow and also from human and mouse cells. Our novel predictions include a microRNA in an intron of tyrosine kinase 2 (TYK2) that is present in both cat and cow, as well as a family of mirtrons with two instances in the human genome. Our predictions support a more expanded miR-2284 family in the bovine genome, a larger mir-548 family in the human genome, and a larger let-7 family in the feline genome.https://doi.org/10.1371/journal.pcbi.1007309
collection DOAJ
language English
format Article
sources DOAJ
author Jimmy Bell
Maureen Larson
Michelle Kutzler
Massimo Bionaz
Christiane V Löhr
David Hendrix
spellingShingle Jimmy Bell
Maureen Larson
Michelle Kutzler
Massimo Bionaz
Christiane V Löhr
David Hendrix
miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.
PLoS Computational Biology
author_facet Jimmy Bell
Maureen Larson
Michelle Kutzler
Massimo Bionaz
Christiane V Löhr
David Hendrix
author_sort Jimmy Bell
title miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.
title_short miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.
title_full miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.
title_fullStr miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.
title_full_unstemmed miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.
title_sort mirwoods: enhanced precursor detection and stacked random forests for the sensitive detection of micrornas.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2019-10-01
description MicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to identifying valid miRs that have small numbers of reads, to properly locating hairpin precursors and to balancing precision and recall. Here, we present miRWoods, which solves these challenges using a duplex-focused precursor detection method and stacked random forests with specialized layers to detect mature and precursor microRNAs, and has been tuned to optimize the harmonic mean of precision and recall. We trained and tuned our discovery pipeline on data sets from the well-annotated human genome, and evaluated its performance on data from mouse. Compared to existing approaches, miRWoods better identifies precursor spans, and can balance sensitivity and specificity for an overall greater prediction accuracy, recalling an average of 10% more annotated microRNAs, and correctly predicts substantially more microRNAs with only one read. We apply this method to the under-annotated genomes of Felis catus (domestic cat) and Bos taurus (cow). We identified hundreds of novel microRNAs in small RNA sequencing data sets from muscle and skin from cat, from 10 tissues from cow and also from human and mouse cells. Our novel predictions include a microRNA in an intron of tyrosine kinase 2 (TYK2) that is present in both cat and cow, as well as a family of mirtrons with two instances in the human genome. Our predictions support a more expanded miR-2284 family in the bovine genome, a larger mir-548 family in the human genome, and a larger let-7 family in the feline genome.
url https://doi.org/10.1371/journal.pcbi.1007309
work_keys_str_mv AT jimmybell mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
AT maureenlarson mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
AT michellekutzler mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
AT massimobionaz mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
AT christianevlohr mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
AT davidhendrix mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas
_version_ 1714667207470874624