Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study

The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain....

Full description

Bibliographic Details
Main Authors: Nicolas T. Fontaine, Xavier F. Cadet, Iyanar Vetrivel
Format: Article
Language:English
Published: MDPI AG 2019-11-01
Series:International Journal of Molecular Sciences
Subjects:
Online Access:https://www.mdpi.com/1422-0067/20/22/5640
id doaj-57c3b0562b634b8b8846688fea15136c
record_format Article
spelling doaj-57c3b0562b634b8b8846688fea15136c2020-11-24T21:33:39ZengMDPI AGInternational Journal of Molecular Sciences1422-00672019-11-012022564010.3390/ijms20225640ijms20225640Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship StudyNicolas T. Fontaine0Xavier F. Cadet1Iyanar Vetrivel2PEACCEL, Protein Engineering ACCELerator, 6 Square Albin Cachot, box 42, 75013 Paris, FrancePEACCEL, Protein Engineering ACCELerator, 6 Square Albin Cachot, box 42, 75013 Paris, FrancePEACCEL, Protein Engineering ACCELerator, 6 Square Albin Cachot, box 42, 75013 Paris, FranceThe work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ) by concatenating the Q elementary numerical sequences, wherein at least one elementary numerical sequence is a protein spectrum obtained by applying fast Fourier transformation (FFT), and (iii) predicting a value of fitness for polypeptide variants (train and/or validation set). These new descriptors were tested on four sets of proteins of different lengths (GLP-2, TNF alpha, cytochrome P450, and epoxide hydrolase) and activities (cAMP activation, binding affinity, thermostability and enantioselectivity). We show that the use of multiple physicochemical descriptors coupled with the implementation of the FFT, taking into account the interactions between residues of amino amides within the protein sequence, could lead to very significant improvement in the quality of models and predictions. The choice of the descriptor or of the combination of descriptors and/or FFT is dependent on the couple protein/fitness. This approach can provide potential users with value added to existing mutant libraries where screening efforts have so far been unsuccessful in finding improved polypeptide mutants for useful applications.https://www.mdpi.com/1422-0067/20/22/5640innov’sarartificial intelligencemachine learningprotein spectrumrational screeningdigital signal processingextended sequencedirected evolution
collection DOAJ
language English
format Article
sources DOAJ
author Nicolas T. Fontaine
Xavier F. Cadet
Iyanar Vetrivel
spellingShingle Nicolas T. Fontaine
Xavier F. Cadet
Iyanar Vetrivel
Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
International Journal of Molecular Sciences
innov’sar
artificial intelligence
machine learning
protein spectrum
rational screening
digital signal processing
extended sequence
directed evolution
author_facet Nicolas T. Fontaine
Xavier F. Cadet
Iyanar Vetrivel
author_sort Nicolas T. Fontaine
title Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title_short Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title_full Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title_fullStr Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title_full_unstemmed Novel Descriptors and Digital Signal Processing- Based Method for Protein Sequence Activity Relationship Study
title_sort novel descriptors and digital signal processing- based method for protein sequence activity relationship study
publisher MDPI AG
series International Journal of Molecular Sciences
issn 1422-0067
publishDate 2019-11-01
description The work aiming to unravel the correlation between protein sequence and function in the absence of structural information can be highly rewarding. We present a new way of considering descriptors from the amino acids index database for modeling and predicting the fitness value of a polypeptide chain. This approach includes the following steps: (i) Calculating Q elementary numerical sequences (Ele_SEQ) depending on the encoding of the amino acid residues, (ii) determining an extended numerical sequence (Ext_SEQ) by concatenating the Q elementary numerical sequences, wherein at least one elementary numerical sequence is a protein spectrum obtained by applying fast Fourier transformation (FFT), and (iii) predicting a value of fitness for polypeptide variants (train and/or validation set). These new descriptors were tested on four sets of proteins of different lengths (GLP-2, TNF alpha, cytochrome P450, and epoxide hydrolase) and activities (cAMP activation, binding affinity, thermostability and enantioselectivity). We show that the use of multiple physicochemical descriptors coupled with the implementation of the FFT, taking into account the interactions between residues of amino amides within the protein sequence, could lead to very significant improvement in the quality of models and predictions. The choice of the descriptor or of the combination of descriptors and/or FFT is dependent on the couple protein/fitness. This approach can provide potential users with value added to existing mutant libraries where screening efforts have so far been unsuccessful in finding improved polypeptide mutants for useful applications.
topic innov’sar
artificial intelligence
machine learning
protein spectrum
rational screening
digital signal processing
extended sequence
directed evolution
url https://www.mdpi.com/1422-0067/20/22/5640
work_keys_str_mv AT nicolastfontaine noveldescriptorsanddigitalsignalprocessingbasedmethodforproteinsequenceactivityrelationshipstudy
AT xavierfcadet noveldescriptorsanddigitalsignalprocessingbasedmethodforproteinsequenceactivityrelationshipstudy
AT iyanarvetrivel noveldescriptorsanddigitalsignalprocessingbasedmethodforproteinsequenceactivityrelationshipstudy
_version_ 1725952806432538624