Grading amino acid properties increased accuracies of single point mutation on protein stability prediction

<p>Abstract</p> <p>Background</p> <p>Protein stabilities can be affected sometimes by point mutations introduced to the protein. Current sequence-information-based protein stability prediction encoding schemes of machine learning approaches include sparse encoding and a...

Full description

Bibliographic Details
Main Authors: Liu Jianguo, Kang Xianjiang
Format: Article
Language:English
Published: BMC 2012-03-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/13/44
id doaj-b447fbf97e264ecabc3b6f424a2646f9
record_format Article
spelling doaj-b447fbf97e264ecabc3b6f424a2646f92020-11-25T00:24:55ZengBMCBMC Bioinformatics1471-21052012-03-011314410.1186/1471-2105-13-44Grading amino acid properties increased accuracies of single point mutation on protein stability predictionLiu JianguoKang Xianjiang<p>Abstract</p> <p>Background</p> <p>Protein stabilities can be affected sometimes by point mutations introduced to the protein. Current sequence-information-based protein stability prediction encoding schemes of machine learning approaches include sparse encoding and amino acid property encoding. Property encoding schemes employ physical-chemical information of the mutated protein environments, however, they produce complexity in the mean time when many properties joined in the scheme. The complexity introduces noises that affect machine learning algorithm accuracies. In order to overcome the problem we described a new encoding scheme that graded twenty amino acids into groups according to their specific property values.</p> <p>Results</p> <p>We employed three predefined values, 0.1, 0.5, and 0.9 to represent 'weak', 'middle', and 'strong' groups for each amino acid property, and introduced two thresholds for each property to split twenty amino acids into one of the three groups according to their property values. Each amino acid can take only one out of three predefined values rather than twenty different values for each property. The complexity and noises in the encoding schemes were reduced in this way. More than 7% average accuracy improvement was found in the graded amino acid property encoding schemes by 20-fold cross validation. The overall accuracy of our method is more than 72% when performed on the independent test sets starting from sequence information with three-state prediction definitions.</p> <p>Conclusions</p> <p>Grading numeric values of amino acid property can reduce the noises and complexity of input information. It is in accordance with biochemical concepts for amino acid properties and makes the input data simplified in the mean time. The idea of graded property encoding schemes may be applied to protein related predictions with machine learning approaches.</p> http://www.biomedcentral.com/1471-2105/13/44
collection DOAJ
language English
format Article
sources DOAJ
author Liu Jianguo
Kang Xianjiang
spellingShingle Liu Jianguo
Kang Xianjiang
Grading amino acid properties increased accuracies of single point mutation on protein stability prediction
BMC Bioinformatics
author_facet Liu Jianguo
Kang Xianjiang
author_sort Liu Jianguo
title Grading amino acid properties increased accuracies of single point mutation on protein stability prediction
title_short Grading amino acid properties increased accuracies of single point mutation on protein stability prediction
title_full Grading amino acid properties increased accuracies of single point mutation on protein stability prediction
title_fullStr Grading amino acid properties increased accuracies of single point mutation on protein stability prediction
title_full_unstemmed Grading amino acid properties increased accuracies of single point mutation on protein stability prediction
title_sort grading amino acid properties increased accuracies of single point mutation on protein stability prediction
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2012-03-01
description <p>Abstract</p> <p>Background</p> <p>Protein stabilities can be affected sometimes by point mutations introduced to the protein. Current sequence-information-based protein stability prediction encoding schemes of machine learning approaches include sparse encoding and amino acid property encoding. Property encoding schemes employ physical-chemical information of the mutated protein environments, however, they produce complexity in the mean time when many properties joined in the scheme. The complexity introduces noises that affect machine learning algorithm accuracies. In order to overcome the problem we described a new encoding scheme that graded twenty amino acids into groups according to their specific property values.</p> <p>Results</p> <p>We employed three predefined values, 0.1, 0.5, and 0.9 to represent 'weak', 'middle', and 'strong' groups for each amino acid property, and introduced two thresholds for each property to split twenty amino acids into one of the three groups according to their property values. Each amino acid can take only one out of three predefined values rather than twenty different values for each property. The complexity and noises in the encoding schemes were reduced in this way. More than 7% average accuracy improvement was found in the graded amino acid property encoding schemes by 20-fold cross validation. The overall accuracy of our method is more than 72% when performed on the independent test sets starting from sequence information with three-state prediction definitions.</p> <p>Conclusions</p> <p>Grading numeric values of amino acid property can reduce the noises and complexity of input information. It is in accordance with biochemical concepts for amino acid properties and makes the input data simplified in the mean time. The idea of graded property encoding schemes may be applied to protein related predictions with machine learning approaches.</p>
url http://www.biomedcentral.com/1471-2105/13/44
work_keys_str_mv AT liujianguo gradingaminoacidpropertiesincreasedaccuraciesofsinglepointmutationonproteinstabilityprediction
AT kangxianjiang gradingaminoacidpropertiesincreasedaccuraciesofsinglepointmutationonproteinstabilityprediction
_version_ 1725350886952140800