IIMLP: integrated information-entropy-based method for LncRNA prediction

Abstract Background The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biolog...

Full description

Bibliographic Details
Main Authors: Junyi Li, Huinian Li, Xiao Ye, Li Zhang, Qingzhe Xu, Yuan Ping, Xiaozhu Jing, Wei Jiang, Qing Liao, Bo Liu, Yadong Wang
Format: Article
Language:English
Published: BMC 2021-05-01
Series:BMC Bioinformatics
Subjects:
Online Access:https://doi.org/10.1186/s12859-020-03884-w
id doaj-edb4331fd50e4bb3a04827e65f243c33
record_format Article
spelling doaj-edb4331fd50e4bb3a04827e65f243c332021-05-16T11:36:17ZengBMCBMC Bioinformatics1471-21052021-05-0122S311210.1186/s12859-020-03884-wIIMLP: integrated information-entropy-based method for LncRNA predictionJunyi Li0Huinian Li1Xiao Ye2Li Zhang3Qingzhe Xu4Yuan Ping5Xiaozhu Jing6Wei Jiang7Qing Liao8Bo Liu9Yadong Wang10School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)Center for Bioinformatics, School of Computer Science and Technology, Harbin Institute of TechnologySchool of Computer Science and Technology, Harbin Institute of Technology (Shenzhen)Abstract Background The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biological experimental methods, many computational methods based on machine learning have been proposed to make better use of the sequence resources of lncRNAs. Results We developed the lncRNA prediction method by integrating information-entropy-based features and machine learning algorithms. We calculate generalized topological entropy and generate 6 novel features for lncRNA sequences. By employing these 6 features and other features such as open reading frame, we apply supporting vector machine, XGBoost and random forest algorithms to distinguish human lncRNAs. We compare our method with the one which has more K-mer features and results show that our method has higher area under the curve up to 99.7905%. Conclusions We develop an accurate and efficient method which has novel information entropy features to analyze and classify lncRNAs. Our method is also extendable for research on the other functional elements in DNA sequences.https://doi.org/10.1186/s12859-020-03884-wLong non-coding RNAInformation entropyGeneralized topological entropyMachine learning
collection DOAJ
language English
format Article
sources DOAJ
author Junyi Li
Huinian Li
Xiao Ye
Li Zhang
Qingzhe Xu
Yuan Ping
Xiaozhu Jing
Wei Jiang
Qing Liao
Bo Liu
Yadong Wang
spellingShingle Junyi Li
Huinian Li
Xiao Ye
Li Zhang
Qingzhe Xu
Yuan Ping
Xiaozhu Jing
Wei Jiang
Qing Liao
Bo Liu
Yadong Wang
IIMLP: integrated information-entropy-based method for LncRNA prediction
BMC Bioinformatics
Long non-coding RNA
Information entropy
Generalized topological entropy
Machine learning
author_facet Junyi Li
Huinian Li
Xiao Ye
Li Zhang
Qingzhe Xu
Yuan Ping
Xiaozhu Jing
Wei Jiang
Qing Liao
Bo Liu
Yadong Wang
author_sort Junyi Li
title IIMLP: integrated information-entropy-based method for LncRNA prediction
title_short IIMLP: integrated information-entropy-based method for LncRNA prediction
title_full IIMLP: integrated information-entropy-based method for LncRNA prediction
title_fullStr IIMLP: integrated information-entropy-based method for LncRNA prediction
title_full_unstemmed IIMLP: integrated information-entropy-based method for LncRNA prediction
title_sort iimlp: integrated information-entropy-based method for lncrna prediction
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2021-05-01
description Abstract Background The prediction of long non-coding RNA (lncRNA) has attracted great attention from researchers, as more and more evidence indicate that various complex human diseases are closely related to lncRNAs. In the era of bio-med big data, in addition to the prediction of lncRNAs by biological experimental methods, many computational methods based on machine learning have been proposed to make better use of the sequence resources of lncRNAs. Results We developed the lncRNA prediction method by integrating information-entropy-based features and machine learning algorithms. We calculate generalized topological entropy and generate 6 novel features for lncRNA sequences. By employing these 6 features and other features such as open reading frame, we apply supporting vector machine, XGBoost and random forest algorithms to distinguish human lncRNAs. We compare our method with the one which has more K-mer features and results show that our method has higher area under the curve up to 99.7905%. Conclusions We develop an accurate and efficient method which has novel information entropy features to analyze and classify lncRNAs. Our method is also extendable for research on the other functional elements in DNA sequences.
topic Long non-coding RNA
Information entropy
Generalized topological entropy
Machine learning
url https://doi.org/10.1186/s12859-020-03884-w
work_keys_str_mv AT junyili iimlpintegratedinformationentropybasedmethodforlncrnaprediction
AT huinianli iimlpintegratedinformationentropybasedmethodforlncrnaprediction
AT xiaoye iimlpintegratedinformationentropybasedmethodforlncrnaprediction
AT lizhang iimlpintegratedinformationentropybasedmethodforlncrnaprediction
AT qingzhexu iimlpintegratedinformationentropybasedmethodforlncrnaprediction
AT yuanping iimlpintegratedinformationentropybasedmethodforlncrnaprediction
AT xiaozhujing iimlpintegratedinformationentropybasedmethodforlncrnaprediction
AT weijiang iimlpintegratedinformationentropybasedmethodforlncrnaprediction
AT qingliao iimlpintegratedinformationentropybasedmethodforlncrnaprediction
AT boliu iimlpintegratedinformationentropybasedmethodforlncrnaprediction
AT yadongwang iimlpintegratedinformationentropybasedmethodforlncrnaprediction
_version_ 1721439422771101696