MicroRNA discovery by similarity search to a database of RNA-seq profiles

In silico generated search for microRNAs (miRNAs)has been driven by methods compiling structural featuresof the miRNA precursor hairpin, as well as tosome degree combining this with the analysis of RNA-seqprofiles for which the miRNA typically leave thedrosha/dicer fingerprint of 1–2 ∼22nt blocks of...

Full description

Bibliographic Details
Main Authors: Sachin ePundhir, Jan eGorodkin
Format: Article
Language:English
Published: Frontiers Media S.A. 2013-07-01
Series:Frontiers in Genetics
Subjects:
Online Access:http://journal.frontiersin.org/Journal/10.3389/fgene.2013.00133/full
Description
Summary:In silico generated search for microRNAs (miRNAs)has been driven by methods compiling structural featuresof the miRNA precursor hairpin, as well as tosome degree combining this with the analysis of RNA-seqprofiles for which the miRNA typically leave thedrosha/dicer fingerprint of 1–2 ∼22nt blocks of readscorresponding to the mature and star miRNA. In complementto the previous methods, we present a studywhere we systematically exploit these patterns of readprofiles.We created two datasets comprised of 2,540 and4,795 read profiles obtained after preprocessing shortRNA-seq data from miRBase and ENCODE respectively.Out of 4,795 ENCODE read profiles, 1,361are annotated as noncoding RNAs (ncRNAs) and ofwhich 285 are further annotated as miRNAs. UsingdeepBlockAlign (dba), we align ncRNA read profilesfrom ENCODE against the miRBase read profiles(cleaned for self-matches) and are able to separateENCODE miRNAs from the other ncRNAs by aMatthews Correlation Coefficient (MCC) of 0.8 and obtainan area under the curve of 0.93. Based on the dbascore cut-off of 0.7 at which we observed the maximumMCC of 0.8, we predict 523 novel miRNA candidates.An additional RNA secondary structure analysis revealthat 42 of the candidates overlap with predicted conservedsecondary structure. Further analysis reveal thatthe 523 miRNA candidates are located in genomic regionswith MAF block (UCSC) fragmentation and poorsequence conservation, which in part might explain whythey have been overlooked in previous efforts.We further analyzed known human and mousemiRNA read profiles and found two distinct classes, firstcontaining two blocks and second containing > 2 blocksof reads. Also the latter class holds read profiles thathave less well defined arrangement of reads in comparisonto the former class. On comparison of miRNA readprofiles from plants and animals, we observed kingdomspecific read profiles that are distinct in terms of bothlength and distribution ...
ISSN:1664-8021