Summary: | Alternative splicing (AS) is a regulated process that takes place during gene expression by which a single gene may code for multiple proteins. This mechanism is controlled by a complex called spliceosome by which certain exons of a gene may be included in or excluded out from the final mRNA produced from that gene. In AS, at least three remarkable signals exist in introns and they are 5’ splice site (5’ss), the donor ss where GU nucleotides are more frequently present, 3’ss, the acceptor ss where AG nucleotides are more frequently present, and branch site. Generally, branch point site is located at 20 to 50 nucleotides upstream from the 3’ss. In this paper, we identify the branch point location using a computational model based on deep learning. We propose a hybrid model based on a combination of dilated convolution neural network and recurrent neural network. Integrating additional inputs to the raw RNA sequence has been studied such as conservation, binding energy, and di-nucleotide. The proposed model has been evaluated on two publicly available datasets and outperformed the current state-of-the-art methods. More specifically, the proposed model achieved for the first dataset 97.29% and 67.08% of the area under curve (ROC-AUC) and the area under precision recall curve (prAUC), respectively, for the second dataset 96.86% and 69.62% of ROC-AUC and prAUC, respectively. In addition, pathogenic variants have been studied by the proposed model and agreed with the reported ones biologically. To study RNA branch point selection, an easy-to-use Web server has been established for free access at: <uri>https://home.jbnu.ac.kr/NSCL/rnabps.htm</uri>.
|