Summary: | 碩士 === 元智大學 === 資訊工程學系 === 100 === RNA-protein interactions are most important intracellular biological processes, and there are essential for understanding mechanisms of various life activities within the cell. Specially, RNA-binding proteins (RBPs) play an important role in the RNA-protein interactions. Identification of RNA-binding residues (RBRs) in proteins can provide valuable insights for biologists. In the absence of structures for RNA-protein complexes; it is strongly desirable to predict RBRs by protein sequences alone. In this thesis, we present an integrated predictor with voting system named WildRBR to tackle this problem, which combines co-conserved motifs discovered by WildSpan with four best predictors as we have known, including BindN, PPRint, PRBR and PiRaNhA for identifying RBRs in protein sequences using several combinations and voting methods. We compare PRBR, PiRaNhA, PPRint, WildSpan and BindN with WildRBR based on 170 dataset (total of 170 RNA-binding proteins complexes), they achieved Matthew’s correlation coefficients (MCC)/F-score/Specificity/Sensitivity are 0.341/0.433/0.903/0.424、0.587/0.776/0.821/0.894、0.480/0.686/0.794/0.795、0.190/0.362/0.827/0.378、0.157/0.368/0.778/0.403 and 0.615/0.739/0.995/0.973, respectively. In conclusion, the predictive performance of WildRBR is better than other four predictors, it has highest sensitivity and accuracy. Conclusively, the efficiency of WildRBR not only is favorable in predicting RBRs from complex-structure-unknown protein but also can applied in predicting RNA-binding regions of proteins and identifying RNA-binding proteins. Finally, the release of WildRBR stand-alone program is largely desired in large-scale proteomics.
|