Summary: | 碩士 === 長庚大學 === 資訊工程學系 === 104 === Protein-binding RNA play a key role in a number of biological processes, such as protein synthesis, mRNA processing, mRNA assembly, ribosome function and eukaryotic spliceosomes. In previous, many methods have developed for predicting RNA-protein interaction in proteins sites. However, only few methods predict RNA-protein interaction in RNA sites. Thus, we developed a method for identification of protein-binding RNA sites using RNA sequence-based and structure-based features, and improved the predictive accuracy of the previous methods. Numerous features considered in our prediction model including sequence-based features, such as RNA sequence length, mono nucleotide composition, tri-nucleotide composition, mono amino acid composition and di-amino acid composition, and structure-based features, such as RNAfold and GLAM2, and interaction propensity. Various classifiers, such as LIBSVM, Random Forest, IBk, Bayesian Network and Naïve Bayes, were taken into consideration for model training and testing with the different window size. The cross-validation results show that our SVM model achieved highest 91.4% sensitivity, 92.6% specify, 92% accuracy and 0.84 Matthew’s correlation coefficient. The independent testing show that our SVM model achieved 77.5% sensitivity, 91.3% specify, 84.4% accuracy and 0.694 MCC.
|