Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian

Abstract Lysine crotonylation (Kcr) is a type of protein post-translational modification (PTM), which plays important roles in a variety of cellular regulation and processes. Several methods have been proposed for the identification of crotonylation. However, most of these methods can predict effici...

Full description

Bibliographic Details
Main Authors: Rulan Wang, Zhuo Wang, Hongfei Wang, Yuxuan Pang, Tzong-Yi Lee
Format: Article
Language:English
Published: Nature Publishing Group 2020-11-01
Series:Scientific Reports
Online Access:https://doi.org/10.1038/s41598-020-77173-0
id doaj-a853eda2aeaf4504982c8c920cfc2d05
record_format Article
spelling doaj-a853eda2aeaf4504982c8c920cfc2d052020-12-08T13:01:37ZengNature Publishing GroupScientific Reports2045-23222020-11-0110111210.1038/s41598-020-77173-0Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalianRulan Wang0Zhuo Wang1Hongfei Wang2Yuxuan Pang3Tzong-Yi Lee4School of Science and Engineering, The Chinese University of Hong Kong (Shenzhen)Warshel Institute for Computational Biology, The Chinese University of Hong Kong (Shenzhen)Department of Orthopaedics and Traumatology, The University of Hong KongSchool of Science and Engineering, The Chinese University of Hong Kong (Shenzhen)School of Life and Health Sciences, The Chinese University of Hong Kong (Shenzhen)Abstract Lysine crotonylation (Kcr) is a type of protein post-translational modification (PTM), which plays important roles in a variety of cellular regulation and processes. Several methods have been proposed for the identification of crotonylation. However, most of these methods can predict efficiently only on histone or non-histone protein. Therefore, this work aims to give a more balanced performance in different species, here plant (non-histone) and mammalian (histone) are involved. SVM (support vector machine) and RF (random forest) were employed in this study. According to the results of cross-validations, the RF classifier based on EGAAC attribute achieved the best predictive performance which performs competitively good as existed methods, meanwhile more robust when dealing with imbalanced datasets. Moreover, an independent test was carried out, which compared the performance of this study and existed methods based on the same features or the same classifier. The classifiers of SVM and RF could achieve best performances with 92% sensitivity, 88% specificity, 90% accuracy, and an MCC of 0.80 in the mammalian dataset, and 77% sensitivity, 83% specificity, 70% accuracy and 0.54 MCC in a relatively small dataset of mammalian and a large-scaled plant dataset respectively. Moreover, a cross-species independent testing was also carried out in this study, which has proved the species diversity in plant and mammalian.https://doi.org/10.1038/s41598-020-77173-0
collection DOAJ
language English
format Article
sources DOAJ
author Rulan Wang
Zhuo Wang
Hongfei Wang
Yuxuan Pang
Tzong-Yi Lee
spellingShingle Rulan Wang
Zhuo Wang
Hongfei Wang
Yuxuan Pang
Tzong-Yi Lee
Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian
Scientific Reports
author_facet Rulan Wang
Zhuo Wang
Hongfei Wang
Yuxuan Pang
Tzong-Yi Lee
author_sort Rulan Wang
title Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian
title_short Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian
title_full Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian
title_fullStr Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian
title_full_unstemmed Characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian
title_sort characterization and identification of lysine crotonylation sites based on machine learning method on both plant and mammalian
publisher Nature Publishing Group
series Scientific Reports
issn 2045-2322
publishDate 2020-11-01
description Abstract Lysine crotonylation (Kcr) is a type of protein post-translational modification (PTM), which plays important roles in a variety of cellular regulation and processes. Several methods have been proposed for the identification of crotonylation. However, most of these methods can predict efficiently only on histone or non-histone protein. Therefore, this work aims to give a more balanced performance in different species, here plant (non-histone) and mammalian (histone) are involved. SVM (support vector machine) and RF (random forest) were employed in this study. According to the results of cross-validations, the RF classifier based on EGAAC attribute achieved the best predictive performance which performs competitively good as existed methods, meanwhile more robust when dealing with imbalanced datasets. Moreover, an independent test was carried out, which compared the performance of this study and existed methods based on the same features or the same classifier. The classifiers of SVM and RF could achieve best performances with 92% sensitivity, 88% specificity, 90% accuracy, and an MCC of 0.80 in the mammalian dataset, and 77% sensitivity, 83% specificity, 70% accuracy and 0.54 MCC in a relatively small dataset of mammalian and a large-scaled plant dataset respectively. Moreover, a cross-species independent testing was also carried out in this study, which has proved the species diversity in plant and mammalian.
url https://doi.org/10.1038/s41598-020-77173-0
work_keys_str_mv AT rulanwang characterizationandidentificationoflysinecrotonylationsitesbasedonmachinelearningmethodonbothplantandmammalian
AT zhuowang characterizationandidentificationoflysinecrotonylationsitesbasedonmachinelearningmethodonbothplantandmammalian
AT hongfeiwang characterizationandidentificationoflysinecrotonylationsitesbasedonmachinelearningmethodonbothplantandmammalian
AT yuxuanpang characterizationandidentificationoflysinecrotonylationsitesbasedonmachinelearningmethodonbothplantandmammalian
AT tzongyilee characterizationandidentificationoflysinecrotonylationsitesbasedonmachinelearningmethodonbothplantandmammalian
_version_ 1724389329833295872