The Application of Biological Properties in the Development of Novel MicroRNA Prediction and Classification Methods

博士 === 國立中興大學 === 資訊科學與工程學系 === 102 === MicroRNAs (miRNAs) are a group of small noncoding RNA (ncRNA) molecules that play an important role in gene regulation. In this dissertation, the interaction between the Ribonuclease III proteins (especially, Drosha and Dicer) and the primary miRNA and the pre...

Full description

Bibliographic Details
Main Authors: Ren-Hao Pan, 潘人豪
Other Authors: Lin-Yu Tseng
Format: Others
Language:en_US
Published: 2014
Online Access:http://ndltd.ncl.edu.tw/handle/00226323286376302337
Description
Summary:博士 === 國立中興大學 === 資訊科學與工程學系 === 102 === MicroRNAs (miRNAs) are a group of small noncoding RNA (ncRNA) molecules that play an important role in gene regulation. In this dissertation, the interaction between the Ribonuclease III proteins (especially, Drosha and Dicer) and the primary miRNA and the precursor miRNA was analyzed. The statistics based on structural features were obtained and utilized to design the criteria for miRNA prediction. Also, a genetic algorithm was devised to locate the positions of the mature miRNA. This research had been applied to process some miRNA cluster sequences with lengths longer than 1 knt and correctly locate the positions of all mature miRNAs. For the mass miRNA dataset, this research provides a mass-data microRNA prediction application, which was based on the multi-layer MapReduce framework and provided four prediction workflows for four different datasets: miRNA-like sequences, miRNA cluster sequences, unknown miRNA sequences and the next generation sequencing (NGS) sequences. These workflows included four core procedures for finding the genome location, the biological criteria filtering and a genetic algorithm based pre-miRNA classifier. Each procedure works as a MapReduce framework and uses JSON format to translate the MapReduce output to the next MapReduce procedure. The results show that the miRNA prediction method not only have high sensitivity and accuracy, but also have ability to process more than one million sequences in acceptable time by relying on the cloud computing system.