miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.
MicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to...
Main Authors: | , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Public Library of Science (PLoS)
2019-10-01
|
Series: | PLoS Computational Biology |
Online Access: | https://doi.org/10.1371/journal.pcbi.1007309 |
id |
doaj-e29e68a26dd54360b6871106e03b66e0 |
---|---|
record_format |
Article |
spelling |
doaj-e29e68a26dd54360b6871106e03b66e02021-04-21T15:38:17ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582019-10-011510e100730910.1371/journal.pcbi.1007309miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs.Jimmy BellMaureen LarsonMichelle KutzlerMassimo BionazChristiane V LöhrDavid HendrixMicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to identifying valid miRs that have small numbers of reads, to properly locating hairpin precursors and to balancing precision and recall. Here, we present miRWoods, which solves these challenges using a duplex-focused precursor detection method and stacked random forests with specialized layers to detect mature and precursor microRNAs, and has been tuned to optimize the harmonic mean of precision and recall. We trained and tuned our discovery pipeline on data sets from the well-annotated human genome, and evaluated its performance on data from mouse. Compared to existing approaches, miRWoods better identifies precursor spans, and can balance sensitivity and specificity for an overall greater prediction accuracy, recalling an average of 10% more annotated microRNAs, and correctly predicts substantially more microRNAs with only one read. We apply this method to the under-annotated genomes of Felis catus (domestic cat) and Bos taurus (cow). We identified hundreds of novel microRNAs in small RNA sequencing data sets from muscle and skin from cat, from 10 tissues from cow and also from human and mouse cells. Our novel predictions include a microRNA in an intron of tyrosine kinase 2 (TYK2) that is present in both cat and cow, as well as a family of mirtrons with two instances in the human genome. Our predictions support a more expanded miR-2284 family in the bovine genome, a larger mir-548 family in the human genome, and a larger let-7 family in the feline genome.https://doi.org/10.1371/journal.pcbi.1007309 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Jimmy Bell Maureen Larson Michelle Kutzler Massimo Bionaz Christiane V Löhr David Hendrix |
spellingShingle |
Jimmy Bell Maureen Larson Michelle Kutzler Massimo Bionaz Christiane V Löhr David Hendrix miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs. PLoS Computational Biology |
author_facet |
Jimmy Bell Maureen Larson Michelle Kutzler Massimo Bionaz Christiane V Löhr David Hendrix |
author_sort |
Jimmy Bell |
title |
miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs. |
title_short |
miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs. |
title_full |
miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs. |
title_fullStr |
miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs. |
title_full_unstemmed |
miRWoods: Enhanced precursor detection and stacked random forests for the sensitive detection of microRNAs. |
title_sort |
mirwoods: enhanced precursor detection and stacked random forests for the sensitive detection of micrornas. |
publisher |
Public Library of Science (PLoS) |
series |
PLoS Computational Biology |
issn |
1553-734X 1553-7358 |
publishDate |
2019-10-01 |
description |
MicroRNAs are conserved, endogenous small RNAs with critical post-transcriptional regulatory functions throughout eukaryota, including prominent roles in development and disease. Despite much effort, microRNA annotations still contain errors and are incomplete due especially to challenges related to identifying valid miRs that have small numbers of reads, to properly locating hairpin precursors and to balancing precision and recall. Here, we present miRWoods, which solves these challenges using a duplex-focused precursor detection method and stacked random forests with specialized layers to detect mature and precursor microRNAs, and has been tuned to optimize the harmonic mean of precision and recall. We trained and tuned our discovery pipeline on data sets from the well-annotated human genome, and evaluated its performance on data from mouse. Compared to existing approaches, miRWoods better identifies precursor spans, and can balance sensitivity and specificity for an overall greater prediction accuracy, recalling an average of 10% more annotated microRNAs, and correctly predicts substantially more microRNAs with only one read. We apply this method to the under-annotated genomes of Felis catus (domestic cat) and Bos taurus (cow). We identified hundreds of novel microRNAs in small RNA sequencing data sets from muscle and skin from cat, from 10 tissues from cow and also from human and mouse cells. Our novel predictions include a microRNA in an intron of tyrosine kinase 2 (TYK2) that is present in both cat and cow, as well as a family of mirtrons with two instances in the human genome. Our predictions support a more expanded miR-2284 family in the bovine genome, a larger mir-548 family in the human genome, and a larger let-7 family in the feline genome. |
url |
https://doi.org/10.1371/journal.pcbi.1007309 |
work_keys_str_mv |
AT jimmybell mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas AT maureenlarson mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas AT michellekutzler mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas AT massimobionaz mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas AT christianevlohr mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas AT davidhendrix mirwoodsenhancedprecursordetectionandstackedrandomforestsforthesensitivedetectionofmicrornas |
_version_ |
1714667207470874624 |