Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning
Circular RNA (circRNA) is a closed long non-coding RNA (lncRNA) formed by covalently closed loops through back-splicing. Emerging evidence indicates that circRNA can influence cellular physiology through various molecular mechanisms. Thus, accurate circRNA identification and prediction of its regula...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2020-07-01
|
Series: | Frontiers in Genetics |
Subjects: | |
Online Access: | https://www.frontiersin.org/article/10.3389/fgene.2020.00655/full |
id |
doaj-39ac3d4e4f2543a8b105e0a550752235 |
---|---|
record_format |
Article |
spelling |
doaj-39ac3d4e4f2543a8b105e0a5507522352020-11-25T03:29:25ZengFrontiers Media S.A.Frontiers in Genetics1664-80212020-07-011110.3389/fgene.2020.00655503789Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine LearningGuishan Zhang0Yiyun Deng1Qingyu Liu2Bingxu Ye3Zhiming Dai4Zhiming Dai5Yaowen Chen6Xianhua Dai7Xianhua Dai8School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, ChinaSchool of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, ChinaSchool of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, ChinaKey Laboratory of Digital Signal and Image Processing of Guangdong Provincial, College of Engineering, Shantou University, Shantou, ChinaSchool of Data and Computer Science, Sun Yat-sen University, Guangzhou, ChinaGuangdong Province Key Laboratory of Big Data Analysis and Processing, Sun Yat-sen University, Guangzhou, ChinaKey Laboratory of Digital Signal and Image Processing of Guangdong Provincial, College of Engineering, Shantou University, Shantou, ChinaSchool of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, ChinaSouthern Marine Science and Engineering Guangdong Laboratory, Zhuhai, ChinaCircular RNA (circRNA) is a closed long non-coding RNA (lncRNA) formed by covalently closed loops through back-splicing. Emerging evidence indicates that circRNA can influence cellular physiology through various molecular mechanisms. Thus, accurate circRNA identification and prediction of its regulatory information are critical for understanding its biogenesis. Although several computational tools based on machine learning have been proposed for circRNA identification, the prediction accuracy remains to be improved. Here, first we present circLGB, a machine learning-based framework to discriminate circRNA from other lncRNAs. circLGB integrates commonly used sequence-derived features and three new features containing adenosine to inosine (A-to-I) deamination, A-to-I density and the internal ribosome entry site. circLGB categorizes circRNAs by utilizing a LightGBM classifier with feature selection. Second, we introduce circMRT, an ensemble machine learning framework to systematically predict the regulatory information for circRNA, including their interactions with microRNA, the RNA binding protein, and transcriptional regulation. Feature sets including sequence-based features, graph features, genome context, and regulatory information features were modeled in circMRT. Experiments on public and our constructed datasets show that the proposed algorithms outperform the available state-of-the-art methods. circLGB is available at http://www.circlgb.com. Source codes are available at https://github.com/Peppags/circLGB-circMRT.https://www.frontiersin.org/article/10.3389/fgene.2020.00655/fullcircular RNAlong non-coding RNAmicroRNARNA binding proteintranscriptional regulationmachine learning |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Guishan Zhang Yiyun Deng Qingyu Liu Bingxu Ye Zhiming Dai Zhiming Dai Yaowen Chen Xianhua Dai Xianhua Dai |
spellingShingle |
Guishan Zhang Yiyun Deng Qingyu Liu Bingxu Ye Zhiming Dai Zhiming Dai Yaowen Chen Xianhua Dai Xianhua Dai Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning Frontiers in Genetics circular RNA long non-coding RNA microRNA RNA binding protein transcriptional regulation machine learning |
author_facet |
Guishan Zhang Yiyun Deng Qingyu Liu Bingxu Ye Zhiming Dai Zhiming Dai Yaowen Chen Xianhua Dai Xianhua Dai |
author_sort |
Guishan Zhang |
title |
Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning |
title_short |
Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning |
title_full |
Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning |
title_fullStr |
Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning |
title_full_unstemmed |
Identifying Circular RNA and Predicting Its Regulatory Interactions by Machine Learning |
title_sort |
identifying circular rna and predicting its regulatory interactions by machine learning |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Genetics |
issn |
1664-8021 |
publishDate |
2020-07-01 |
description |
Circular RNA (circRNA) is a closed long non-coding RNA (lncRNA) formed by covalently closed loops through back-splicing. Emerging evidence indicates that circRNA can influence cellular physiology through various molecular mechanisms. Thus, accurate circRNA identification and prediction of its regulatory information are critical for understanding its biogenesis. Although several computational tools based on machine learning have been proposed for circRNA identification, the prediction accuracy remains to be improved. Here, first we present circLGB, a machine learning-based framework to discriminate circRNA from other lncRNAs. circLGB integrates commonly used sequence-derived features and three new features containing adenosine to inosine (A-to-I) deamination, A-to-I density and the internal ribosome entry site. circLGB categorizes circRNAs by utilizing a LightGBM classifier with feature selection. Second, we introduce circMRT, an ensemble machine learning framework to systematically predict the regulatory information for circRNA, including their interactions with microRNA, the RNA binding protein, and transcriptional regulation. Feature sets including sequence-based features, graph features, genome context, and regulatory information features were modeled in circMRT. Experiments on public and our constructed datasets show that the proposed algorithms outperform the available state-of-the-art methods. circLGB is available at http://www.circlgb.com. Source codes are available at https://github.com/Peppags/circLGB-circMRT. |
topic |
circular RNA long non-coding RNA microRNA RNA binding protein transcriptional regulation machine learning |
url |
https://www.frontiersin.org/article/10.3389/fgene.2020.00655/full |
work_keys_str_mv |
AT guishanzhang identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning AT yiyundeng identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning AT qingyuliu identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning AT bingxuye identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning AT zhimingdai identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning AT zhimingdai identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning AT yaowenchen identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning AT xianhuadai identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning AT xianhuadai identifyingcircularrnaandpredictingitsregulatoryinteractionsbymachinelearning |
_version_ |
1724579304159838208 |