Scoring function to predict solubility mutagenesis

Abstract Background Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small...

Full description

Bibliographic Details
Main Authors:	Deutsch Christopher, Tian Ye, Krishnamoorthy Bala
Format:	Article
Language:	English
Published:	BMC 2010-10-01
Series:	Algorithms for Molecular Biology
Online Access:	http://www.almob.org/content/5/1/33

id	doaj-35480cadfd2240928d266ac0210e5e91
record_format	Article
spelling	doaj-35480cadfd2240928d266ac0210e5e912020-11-25T00:23:16ZengBMCAlgorithms for Molecular Biology1748-71882010-10-01513310.1186/1748-7188-5-33Scoring function to predict solubility mutagenesisDeutsch ChristopherTian YeKrishnamoorthy Bala<p>Abstract</p> <p>Background</p> <p>Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention.</p> <p>Results</p> <p>We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%.</p> <p>Availability</p> <p>Executables of programs, tables of weights, and datasets of mutants are available from the following web page: <url>http://www.wsu.edu/~kbala/OptSolMut.html</url>.</p> http://www.almob.org/content/5/1/33
collection	DOAJ
language	English
format	Article
sources	DOAJ
author	Deutsch Christopher Tian Ye Krishnamoorthy Bala
spellingShingle	Deutsch Christopher Tian Ye Krishnamoorthy Bala Scoring function to predict solubility mutagenesis Algorithms for Molecular Biology
author_facet	Deutsch Christopher Tian Ye Krishnamoorthy Bala
author_sort	Deutsch Christopher
title	Scoring function to predict solubility mutagenesis
title_short	Scoring function to predict solubility mutagenesis
title_full	Scoring function to predict solubility mutagenesis
title_fullStr	Scoring function to predict solubility mutagenesis
title_full_unstemmed	Scoring function to predict solubility mutagenesis
title_sort	scoring function to predict solubility mutagenesis
publisher	BMC
series	Algorithms for Molecular Biology
issn	1748-7188
publishDate	2010-10-01
description	<p>Abstract</p> <p>Background</p> <p>Mutagenesis is commonly used to engineer proteins with desirable properties not present in the wild type (WT) protein, such as increased or decreased stability, reactivity, or solubility. Experimentalists often have to choose a small subset of mutations from a large number of candidates to obtain the desired change, and computational techniques are invaluable to make the choices. While several such methods have been proposed to predict stability and reactivity mutagenesis, solubility has not received much attention.</p> <p>Results</p> <p>We use concepts from computational geometry to define a three body scoring function that predicts the change in protein solubility due to mutations. The scoring function captures both sequence and structure information. By exploring the literature, we have assembled a substantial database of 137 single- and multiple-point solubility mutations. Our database is the largest such collection with structural information known so far. We optimize the scoring function using linear programming (LP) methods to derive its weights based on training. Starting with default values of 1, we find weights in the range [0,2] so that predictions of increase or decrease in solubility are optimized. We compare the LP method to the standard machine learning techniques of support vector machines (SVM) and the Lasso. Using statistics for leave-one-out (LOO), 10-fold, and 3-fold cross validations (CV) for training and prediction, we demonstrate that the LP method performs the best overall. For the LOOCV, the LP method has an overall accuracy of 81%.</p> <p>Availability</p> <p>Executables of programs, tables of weights, and datasets of mutants are available from the following web page: <url>http://www.wsu.edu/~kbala/OptSolMut.html</url>.</p>
url	http://www.almob.org/content/5/1/33
work_keys_str_mv	AT deutschchristopher scoringfunctiontopredictsolubilitymutagenesis AT tianye scoringfunctiontopredictsolubilitymutagenesis AT krishnamoorthybala scoringfunctiontopredictsolubilitymutagenesis
_version_	1725357954118451200

Scoring function to predict solubility mutagenesis

Similar Items