Automatic single- and multi-label enzymatic function prediction by machine learning

The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when cata...

Full description

Bibliographic Details
Main Authors: Shervine Amidi, Afshine Amidi, Dimitrios Vlachakis, Nikos Paragios, Evangelia I. Zacharaki
Format: Article
Language:English
Published: PeerJ Inc. 2017-03-01
Series:PeerJ
Subjects:
Online Access:https://peerj.com/articles/3095.pdf
id doaj-0d6e56a4e22e466aabf80736e8622585
record_format Article
spelling doaj-0d6e56a4e22e466aabf80736e86225852020-11-25T00:00:38ZengPeerJ Inc.PeerJ2167-83592017-03-015e309510.7717/peerj.3095Automatic single- and multi-label enzymatic function prediction by machine learningShervine Amidi0Afshine Amidi1Dimitrios Vlachakis2Nikos Paragios3Evangelia I. Zacharaki4Department of Applied Mathematics, Center for Visual Computing, Ecole Centrale de Paris (CentraleSupélec), Châtenay-Malabry, FranceDepartment of Applied Mathematics, Center for Visual Computing, Ecole Centrale de Paris (CentraleSupélec), Châtenay-Malabry, FranceMDAKM Group, Department of Computer Engineering and Informatics, University of Patras, Patras, GreeceDepartment of Applied Mathematics, Center for Visual Computing, Ecole Centrale de Paris (CentraleSupélec), Châtenay-Malabry, FranceDepartment of Applied Mathematics, Center for Visual Computing, Ecole Centrale de Paris (CentraleSupélec), Châtenay-Malabry, FranceThe number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available at https://figshare.com/s/a63e0bafa9b71fc7cbd7.https://peerj.com/articles/3095.pdfEnzyme classificationSingle-labelMulti-labelStructural informationAmino acid sequenceSmith-Waterman algorithm
collection DOAJ
language English
format Article
sources DOAJ
author Shervine Amidi
Afshine Amidi
Dimitrios Vlachakis
Nikos Paragios
Evangelia I. Zacharaki
spellingShingle Shervine Amidi
Afshine Amidi
Dimitrios Vlachakis
Nikos Paragios
Evangelia I. Zacharaki
Automatic single- and multi-label enzymatic function prediction by machine learning
PeerJ
Enzyme classification
Single-label
Multi-label
Structural information
Amino acid sequence
Smith-Waterman algorithm
author_facet Shervine Amidi
Afshine Amidi
Dimitrios Vlachakis
Nikos Paragios
Evangelia I. Zacharaki
author_sort Shervine Amidi
title Automatic single- and multi-label enzymatic function prediction by machine learning
title_short Automatic single- and multi-label enzymatic function prediction by machine learning
title_full Automatic single- and multi-label enzymatic function prediction by machine learning
title_fullStr Automatic single- and multi-label enzymatic function prediction by machine learning
title_full_unstemmed Automatic single- and multi-label enzymatic function prediction by machine learning
title_sort automatic single- and multi-label enzymatic function prediction by machine learning
publisher PeerJ Inc.
series PeerJ
issn 2167-8359
publishDate 2017-03-01
description The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available at https://figshare.com/s/a63e0bafa9b71fc7cbd7.
topic Enzyme classification
Single-label
Multi-label
Structural information
Amino acid sequence
Smith-Waterman algorithm
url https://peerj.com/articles/3095.pdf
work_keys_str_mv AT shervineamidi automaticsingleandmultilabelenzymaticfunctionpredictionbymachinelearning
AT afshineamidi automaticsingleandmultilabelenzymaticfunctionpredictionbymachinelearning
AT dimitriosvlachakis automaticsingleandmultilabelenzymaticfunctionpredictionbymachinelearning
AT nikosparagios automaticsingleandmultilabelenzymaticfunctionpredictionbymachinelearning
AT evangeliaizacharaki automaticsingleandmultilabelenzymaticfunctionpredictionbymachinelearning
_version_ 1725444146328502272