Application of fourier transform and proteochemometrics principles to protein engineering

Abstract Background Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called...

Full description

Bibliographic Details
Main Authors: Frédéric Cadet, Nicolas Fontaine, Iyanar Vetrivel, Matthieu Ng Fuk Chong, Olivier Savriama, Xavier Cadet, Philippe Charton
Format: Article
Language:English
Published: BMC 2018-10-01
Series:BMC Bioinformatics
Subjects:
Online Access:http://link.springer.com/article/10.1186/s12859-018-2407-8
id doaj-18efb2f9e4194a9f9d1cccdebc9843b8
record_format Article
spelling doaj-18efb2f9e4194a9f9d1cccdebc9843b82020-11-25T01:28:59ZengBMCBMC Bioinformatics1471-21052018-10-0119111110.1186/s12859-018-2407-8Application of fourier transform and proteochemometrics principles to protein engineeringFrédéric Cadet0Nicolas Fontaine1Iyanar Vetrivel2Matthieu Ng Fuk Chong3Olivier Savriama4Xavier Cadet5Philippe Charton6Peaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorAbstract Background Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (innovative Sequence Activity Relationship) that relies on the physicochemical properties of the amino acids, digital signal processing and partial least squares regression to uncover these sequence-function correlations. Results We show that the digitalized representation of the protein sequence in the form of a Fourier spectrum can be used as an efficient descriptor to model the sequence-activity relationship of proteins. The iSAR methodology that we have developed identifies high fitness mutants from mutant libraries relying on physicochemical properties of the amino acids, digital signal processing and regression techniques. iSAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone. The utility of the method is illustrated on 4 datasets: cytochrome P450 for thermostability, TNF-alpha for binding affinity, GLP-2 for potency and enterotoxins for thermostability. The choice of the datasets has been made such as to illustrate the ability of the method to perform when limited training data is available and also when novel mutations appear in the test set, that have not been featured in the training set. Conclusion The combination of Fast Fourier Transform and Partial Least Squares regression is efficient in capturing the effects of mutations on the function of the protein. iSAR is a fast algorithm which can be implemented with limited computational resources and can make effective predictions even if the training set is limited in size.http://link.springer.com/article/10.1186/s12859-018-2407-8Directed evolutionProtein sequence activity relationshipProtein spectrumRational screeningStatistical modelling
collection DOAJ
language English
format Article
sources DOAJ
author Frédéric Cadet
Nicolas Fontaine
Iyanar Vetrivel
Matthieu Ng Fuk Chong
Olivier Savriama
Xavier Cadet
Philippe Charton
spellingShingle Frédéric Cadet
Nicolas Fontaine
Iyanar Vetrivel
Matthieu Ng Fuk Chong
Olivier Savriama
Xavier Cadet
Philippe Charton
Application of fourier transform and proteochemometrics principles to protein engineering
BMC Bioinformatics
Directed evolution
Protein sequence activity relationship
Protein spectrum
Rational screening
Statistical modelling
author_facet Frédéric Cadet
Nicolas Fontaine
Iyanar Vetrivel
Matthieu Ng Fuk Chong
Olivier Savriama
Xavier Cadet
Philippe Charton
author_sort Frédéric Cadet
title Application of fourier transform and proteochemometrics principles to protein engineering
title_short Application of fourier transform and proteochemometrics principles to protein engineering
title_full Application of fourier transform and proteochemometrics principles to protein engineering
title_fullStr Application of fourier transform and proteochemometrics principles to protein engineering
title_full_unstemmed Application of fourier transform and proteochemometrics principles to protein engineering
title_sort application of fourier transform and proteochemometrics principles to protein engineering
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2018-10-01
description Abstract Background Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (innovative Sequence Activity Relationship) that relies on the physicochemical properties of the amino acids, digital signal processing and partial least squares regression to uncover these sequence-function correlations. Results We show that the digitalized representation of the protein sequence in the form of a Fourier spectrum can be used as an efficient descriptor to model the sequence-activity relationship of proteins. The iSAR methodology that we have developed identifies high fitness mutants from mutant libraries relying on physicochemical properties of the amino acids, digital signal processing and regression techniques. iSAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone. The utility of the method is illustrated on 4 datasets: cytochrome P450 for thermostability, TNF-alpha for binding affinity, GLP-2 for potency and enterotoxins for thermostability. The choice of the datasets has been made such as to illustrate the ability of the method to perform when limited training data is available and also when novel mutations appear in the test set, that have not been featured in the training set. Conclusion The combination of Fast Fourier Transform and Partial Least Squares regression is efficient in capturing the effects of mutations on the function of the protein. iSAR is a fast algorithm which can be implemented with limited computational resources and can make effective predictions even if the training set is limited in size.
topic Directed evolution
Protein sequence activity relationship
Protein spectrum
Rational screening
Statistical modelling
url http://link.springer.com/article/10.1186/s12859-018-2407-8
work_keys_str_mv AT fredericcadet applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT nicolasfontaine applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT iyanarvetrivel applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT matthieungfukchong applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT oliviersavriama applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT xaviercadet applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
AT philippecharton applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering
_version_ 1725099232214384640