Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning

Circular RNA (circRNA) is a closed long non-coding RNA (lncRNA) formed by covalently closed loops through back-splicing. Emerging evidence indicates that circRNA can influence cellular physiology through various molecular mechanisms. Thus, accurate circRNA identification and prediction of its regula...

Full description

Bibliographic Details
Main Authors: Guishan Zhang, Yiyun Deng, Qingyu Liu, Bingxu Ye, Zhiming Dai, Yaowen Chen, Xianhua Dai
Format: Article
Language:English
Published: Frontiers Media S.A. 2020-07-01
Series:Frontiers in Genetics
Subjects:
Online Access:https://www.frontiersin.org/article/10.3389/fgene.2020.00655/full
id doaj-39ac3d4e4f2543a8b105e0a550752235
record_format Article
spelling doaj-39ac3d4e4f2543a8b105e0a5507522352020-11-25T03:29:25ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-07-011110.3389/fgene.2020.00655503789Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine LearningGuishan Zhang0Yiyun Deng1Qingyu Liu2Bingxu Ye3Zhiming Dai4Zhiming Dai5Yaowen Chen6Xianhua Dai7Xianhua Dai8School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, ChinaSchool of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, ChinaSchool of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, ChinaKey Laboratory of Digital Signal and Image Processing of Guangdong Provincial, College of Engineering, Shantou University, Shantou, ChinaSchool of Data and Computer Science, Sun Yat-sen University, Guangzhou, ChinaGuangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou, ChinaKey Laboratory of Digital Signal and Image Processing of Guangdong Provincial, College of Engineering, Shantou University, Shantou, ChinaSchool of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, ChinaSouthern Marine Science and Engineering Guangdong Laboratory, Zhuhai, ChinaCircular RNA (circRNA) is a closed long non-coding RNA (lncRNA) formed by covalently closed loops through back-splicing. Emerging evidence indicates that circRNA can influence cellular physiology through various molecular mechanisms. Thus, accurate circRNA identification and prediction of its regulatory information are critical for understanding its biogenesis. Although several computational tools based on machine learning have been proposed for circRNA identification, the prediction accuracy remains to be improved. Here, first we present circLGB, a machine learning-based framework to discriminate circRNA from other lncRNAs. circLGB integrates commonly used sequence-derived features and three new features containing adenosine to inosine (A-to-I) deamination, A-to-I density and the internal ribosome entry site. circLGB categorizes circRNAs by utilizing a LightGBM classifier with feature selection. Second, we introduce circMRT, an ensemble machine learning framework to systematically predict the regulatory information for circRNA, including their interactions with microRNA, the RNA binding protein, and transcriptional regulation. Feature sets including sequence-based features, graph features, genome context, and regulatory information features were modeled in circMRT. Experiments on public and our constructed datasets show that the proposed algorithms outperform the available state-of-the-art methods. circLGB is available at http://www.circlgb.com. Source codes are available at https://github.com/Peppags/circLGB-circMRT.https://www.frontiersin.org/article/10.3389/fgene.2020.00655/fullcircular RNAlong non-coding RNAmicroRNARNA binding proteintranscriptional regulationmachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Guishan Zhang
Yiyun Deng
Qingyu Liu
Bingxu Ye
Zhiming Dai
Zhiming Dai
Yaowen Chen
Xianhua Dai
Xianhua Dai
spellingShingle Guishan Zhang
Yiyun Deng
Qingyu Liu
Bingxu Ye
Zhiming Dai
Zhiming Dai
Yaowen Chen
Xianhua Dai
Xianhua Dai
Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning
Frontiers in Genetics
circular RNA
long non-coding RNA
microRNA
RNA binding protein
transcriptional regulation
machine learning
author_facet Guishan Zhang
Yiyun Deng
Qingyu Liu
Bingxu Ye
Zhiming Dai
Zhiming Dai
Yaowen Chen
Xianhua Dai
Xianhua Dai
author_sort Guishan Zhang
title Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning
title_short Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning
title_full Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning
title_fullStr Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning
title_full_unstemmed Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning
title_sort identifying circular rna and predicting its regulatory interactions by machine learning
publisher Frontiers Media S.A.
series Frontiers in Genetics
issn 1664-8021
publishDate 2020-07-01
description Circular RNA (circRNA) is a closed long non-coding RNA (lncRNA) formed by covalently closed loops through back-splicing. Emerging evidence indicates that circRNA can influence cellular physiology through various molecular mechanisms. Thus, accurate circRNA identification and prediction of its regulatory information are critical for understanding its biogenesis. Although several computational tools based on machine learning have been proposed for circRNA identification, the prediction accuracy remains to be improved. Here, first we present circLGB, a machine learning-based framework to discriminate circRNA from other lncRNAs. circLGB integrates commonly used sequence-derived features and three new features containing adenosine to inosine (A-to-I) deamination, A-to-I density and the internal ribosome entry site. circLGB categorizes circRNAs by utilizing a LightGBM classifier with feature selection. Second, we introduce circMRT, an ensemble machine learning framework to systematically predict the regulatory information for circRNA, including their interactions with microRNA, the RNA binding protein, and transcriptional regulation. Feature sets including sequence-based features, graph features, genome context, and regulatory information features were modeled in circMRT. Experiments on public and our constructed datasets show that the proposed algorithms outperform the available state-of-the-art methods. circLGB is available at http://www.circlgb.com. Source codes are available at https://github.com/Peppags/circLGB-circMRT.
topic circular RNA
long non-coding RNA
microRNA
RNA binding protein
transcriptional regulation
machine learning
url https://www.frontiersin.org/article/10.3389/fgene.2020.00655/full
work_keys_str_mv AT guishanzhang identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning
AT yiyundeng identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning
AT qingyuliu identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning
AT bingxuye identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning
AT zhimingdai identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning
AT zhimingdai identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning
AT yaowenchen identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning
AT xianhuadai identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning
AT xianhuadai identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning
_version_ 1724579304159838208