Identification of cyclin protein using gradient boost decision tree algorithm

Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor predicti...

Full description

Bibliographic Details
Main Authors: Hasan Zulfiqar, Shi-Shi Yuan, Qin-Lai Huang, Zi-Jie Sun, Fu-Ying Dao, Xiao-Long Yu, Hao Lin
Format: Article
Language:English
Published: Elsevier 2021-01-01
Series:Computational and Structural Biotechnology Journal
Subjects:
Online Access:http://www.sciencedirect.com/science/article/pii/S2001037021003032
id doaj-b173886f437e4cc9bb04b6339b0fc1ce
record_format Article
spelling doaj-b173886f437e4cc9bb04b6339b0fc1ce2021-07-31T04:38:46ZengElsevierComputational and Structural Biotechnology Journal2001-03702021-01-011941234131Identification of cyclin protein using gradient boost decision tree algorithmHasan Zulfiqar0Shi-Shi Yuan1Qin-Lai Huang2Zi-Jie Sun3Fu-Ying Dao4Xiao-Long Yu5Hao Lin6School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, ChinaSchool of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, ChinaSchool of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, ChinaSchool of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, ChinaSchool of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, ChinaSchool of Materials Science and Engineering, Hainan University, Haikou 570228, China; Corresponding authors.School of Life Science and Technology and Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China; Corresponding authors.Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.http://www.sciencedirect.com/science/article/pii/S2001037021003032Cyclin proteinClassificationFeature extractionFeature selectionRandom forest
collection DOAJ
language English
format Article
sources DOAJ
author Hasan Zulfiqar
Shi-Shi Yuan
Qin-Lai Huang
Zi-Jie Sun
Fu-Ying Dao
Xiao-Long Yu
Hao Lin
spellingShingle Hasan Zulfiqar
Shi-Shi Yuan
Qin-Lai Huang
Zi-Jie Sun
Fu-Ying Dao
Xiao-Long Yu
Hao Lin
Identification of cyclin protein using gradient boost decision tree algorithm
Computational and Structural Biotechnology Journal
Cyclin protein
Classification
Feature extraction
Feature selection
Random forest
author_facet Hasan Zulfiqar
Shi-Shi Yuan
Qin-Lai Huang
Zi-Jie Sun
Fu-Ying Dao
Xiao-Long Yu
Hao Lin
author_sort Hasan Zulfiqar
title Identification of cyclin protein using gradient boost decision tree algorithm
title_short Identification of cyclin protein using gradient boost decision tree algorithm
title_full Identification of cyclin protein using gradient boost decision tree algorithm
title_fullStr Identification of cyclin protein using gradient boost decision tree algorithm
title_full_unstemmed Identification of cyclin protein using gradient boost decision tree algorithm
title_sort identification of cyclin protein using gradient boost decision tree algorithm
publisher Elsevier
series Computational and Structural Biotechnology Journal
issn 2001-0370
publishDate 2021-01-01
description Cyclin proteins are capable to regulate the cell cycle by forming a complex with cyclin-dependent kinases to activate cell cycle. Correct recognition of cyclin proteins could provide key clues for studying their functions. However, their sequences share low similarity, which results in poor prediction for sequence similarity-based methods. Thus, it is urgent to construct a machine learning model to identify cyclin proteins. This study aimed to develop a computational model to discriminate cyclin proteins from non-cyclin proteins. In our model, protein sequences were encoded by seven kinds of features that are amino acid composition, composition of k-spaced amino acid pairs, tri peptide composition, pseudo amino acid composition, geary correlation, normalized moreau-broto autocorrelation and composition/transition/distribution. Afterward, these features were optimized by using analysis of variance (ANOVA) and minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) technique. A gradient boost decision tree (GBDT) classifier was trained on the optimal features. Five-fold cross-validated results showed that our model would identify cyclins with an accuracy of 93.06% and AUC value of 0.971, which are higher than the two recent studies on the same data.
topic Cyclin protein
Classification
Feature extraction
Feature selection
Random forest
url http://www.sciencedirect.com/science/article/pii/S2001037021003032
work_keys_str_mv AT hasanzulfiqar identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT shishiyuan identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT qinlaihuang identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT zijiesun identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT fuyingdao identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT xiaolongyu identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
AT haolin identificationofcyclinproteinusinggradientboostdecisiontreealgorithm
_version_ 1721247062812524544