Application of fourier transform and proteochemometrics principles to protein engineering
Abstract Background Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called...
Main Authors: | , , , , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2018-10-01
|
Series: | BMC Bioinformatics |
Subjects: | |
Online Access: | http://link.springer.com/article/10.1186/s12859-018-2407-8 |
id |
doaj-18efb2f9e4194a9f9d1cccdebc9843b8 |
---|---|
record_format |
Article |
spelling |
doaj-18efb2f9e4194a9f9d1cccdebc9843b82020-11-25T01:28:59ZengBMCBMC Bioinformatics1471-21052018-10-0119111110.1186/s12859-018-2407-8Application of fourier transform and proteochemometrics principles to protein engineeringFrédéric Cadet0Nicolas Fontaine1Iyanar Vetrivel2Matthieu Ng Fuk Chong3Olivier Savriama4Xavier Cadet5Philippe Charton6Peaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorPeaccel SAS, Protein Engineering ACCELeratorAbstract Background Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (innovative Sequence Activity Relationship) that relies on the physicochemical properties of the amino acids, digital signal processing and partial least squares regression to uncover these sequence-function correlations. Results We show that the digitalized representation of the protein sequence in the form of a Fourier spectrum can be used as an efficient descriptor to model the sequence-activity relationship of proteins. The iSAR methodology that we have developed identifies high fitness mutants from mutant libraries relying on physicochemical properties of the amino acids, digital signal processing and regression techniques. iSAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone. The utility of the method is illustrated on 4 datasets: cytochrome P450 for thermostability, TNF-alpha for binding affinity, GLP-2 for potency and enterotoxins for thermostability. The choice of the datasets has been made such as to illustrate the ability of the method to perform when limited training data is available and also when novel mutations appear in the test set, that have not been featured in the training set. Conclusion The combination of Fast Fourier Transform and Partial Least Squares regression is efficient in capturing the effects of mutations on the function of the protein. iSAR is a fast algorithm which can be implemented with limited computational resources and can make effective predictions even if the training set is limited in size.http://link.springer.com/article/10.1186/s12859-018-2407-8Directed evolutionProtein sequence activity relationshipProtein spectrumRational screeningStatistical modelling |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Frédéric Cadet Nicolas Fontaine Iyanar Vetrivel Matthieu Ng Fuk Chong Olivier Savriama Xavier Cadet Philippe Charton |
spellingShingle |
Frédéric Cadet Nicolas Fontaine Iyanar Vetrivel Matthieu Ng Fuk Chong Olivier Savriama Xavier Cadet Philippe Charton Application of fourier transform and proteochemometrics principles to protein engineering BMC Bioinformatics Directed evolution Protein sequence activity relationship Protein spectrum Rational screening Statistical modelling |
author_facet |
Frédéric Cadet Nicolas Fontaine Iyanar Vetrivel Matthieu Ng Fuk Chong Olivier Savriama Xavier Cadet Philippe Charton |
author_sort |
Frédéric Cadet |
title |
Application of fourier transform and proteochemometrics principles to protein engineering |
title_short |
Application of fourier transform and proteochemometrics principles to protein engineering |
title_full |
Application of fourier transform and proteochemometrics principles to protein engineering |
title_fullStr |
Application of fourier transform and proteochemometrics principles to protein engineering |
title_full_unstemmed |
Application of fourier transform and proteochemometrics principles to protein engineering |
title_sort |
application of fourier transform and proteochemometrics principles to protein engineering |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2018-10-01 |
description |
Abstract Background Connecting the dots between the protein sequence and its function is of fundamental interest for protein engineers. In-silico methods are useful in this quest especially when structural information is not available. In this study we propose a mutant library screening tool called iSAR (innovative Sequence Activity Relationship) that relies on the physicochemical properties of the amino acids, digital signal processing and partial least squares regression to uncover these sequence-function correlations. Results We show that the digitalized representation of the protein sequence in the form of a Fourier spectrum can be used as an efficient descriptor to model the sequence-activity relationship of proteins. The iSAR methodology that we have developed identifies high fitness mutants from mutant libraries relying on physicochemical properties of the amino acids, digital signal processing and regression techniques. iSAR correlates variations caused by mutations in spectra with biological activity/fitness. It takes into account the impact of mutations on the whole spectrum and does not focus on local fitness alone. The utility of the method is illustrated on 4 datasets: cytochrome P450 for thermostability, TNF-alpha for binding affinity, GLP-2 for potency and enterotoxins for thermostability. The choice of the datasets has been made such as to illustrate the ability of the method to perform when limited training data is available and also when novel mutations appear in the test set, that have not been featured in the training set. Conclusion The combination of Fast Fourier Transform and Partial Least Squares regression is efficient in capturing the effects of mutations on the function of the protein. iSAR is a fast algorithm which can be implemented with limited computational resources and can make effective predictions even if the training set is limited in size. |
topic |
Directed evolution Protein sequence activity relationship Protein spectrum Rational screening Statistical modelling |
url |
http://link.springer.com/article/10.1186/s12859-018-2407-8 |
work_keys_str_mv |
AT fredericcadet applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT nicolasfontaine applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT iyanarvetrivel applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT matthieungfukchong applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT oliviersavriama applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT xaviercadet applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering AT philippecharton applicationoffouriertransformandproteochemometricsprinciplestoproteinengineering |
_version_ |
1725099232214384640 |