Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy
<p>Abstract</p> <p>Background</p> <p>Understanding how amino acid substitutions affect protein functions is critical for the study of proteins and their implications in diseases. Although methods have been developed for predicting potential effects of amino acid substit...
Main Authors: | , , , |
---|---|
Format: | Article |
Language: | English |
Published: |
BMC
2006-09-01
|
Series: | BMC Bioinformatics |
Online Access: | http://www.biomedcentral.com/1471-2105/7/417 |
id |
doaj-ab50c8571ebc41458dbd0a03220c8f4b |
---|---|
record_format |
Article |
spelling |
doaj-ab50c8571ebc41458dbd0a03220c8f4b2020-11-25T00:17:33ZengBMCBMC Bioinformatics1471-21052006-09-017141710.1186/1471-2105-7-417Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategyChen TingSun FengzhuYang HuaJiang Rui<p>Abstract</p> <p>Background</p> <p>Understanding how amino acid substitutions affect protein functions is critical for the study of proteins and their implications in diseases. Although methods have been developed for predicting potential effects of amino acid substitutions using sequence, three-dimensional structural, and evolutionary properties of proteins, the applications are limited by the complication of the features and the availability of protein structural information. Another limitation is that the prediction results are hard to be interpreted with physicochemical principles and biological knowledge.</p> <p>Results</p> <p>To overcome these limitations, we proposed a novel feature set using physicochemical properties of amino acids, evolutionary profiles of proteins, and protein sequence information. We applied the support vector machine and the random forest with the feature set to experimental amino acid substitutions occurring in the <it>E. coli </it>lac repressor and the bacteriophage T4 lysozyme, as well as to annotated amino acid substitutions occurring in a wide range of human proteins. The results showed that the proposed feature set was superior to the existing ones. To explore physicochemical principles behind amino acid substitutions, we designed a simulated annealing bump hunting strategy to automatically extract interpretable rules for amino acid substitutions. We applied the strategy to annotated human amino acid substitutions and successfully extracted several rules which were either consistent with current biological knowledge or providing new insights for the understanding of amino acid substitutions. When applied to unclassified data, these rules could cover a large portion of samples, and most of the covered samples showed good agreement with predictions made by either the support vector machine or the random forest.</p> <p>Conclusion</p> <p>The prediction methods using the proposed feature set can achieve larger AUC (the area under the ROC curve), smaller BER (the balanced error rate), and larger MCC (the Matthews' correlation coefficient) than those using the published feature sets, suggesting that our feature set is superior to the existing ones. The rules extracted by the simulated annealing bump hunting strategy have comparable coverage and accuracy but much better interpretability as those extracted by the patient rule induction method (PRIM), revealing that the strategy is more effective in inducing interpretable rules.</p> http://www.biomedcentral.com/1471-2105/7/417 |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Chen Ting Sun Fengzhu Yang Hua Jiang Rui |
spellingShingle |
Chen Ting Sun Fengzhu Yang Hua Jiang Rui Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy BMC Bioinformatics |
author_facet |
Chen Ting Sun Fengzhu Yang Hua Jiang Rui |
author_sort |
Chen Ting |
title |
Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy |
title_short |
Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy |
title_full |
Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy |
title_fullStr |
Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy |
title_full_unstemmed |
Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy |
title_sort |
searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy |
publisher |
BMC |
series |
BMC Bioinformatics |
issn |
1471-2105 |
publishDate |
2006-09-01 |
description |
<p>Abstract</p> <p>Background</p> <p>Understanding how amino acid substitutions affect protein functions is critical for the study of proteins and their implications in diseases. Although methods have been developed for predicting potential effects of amino acid substitutions using sequence, three-dimensional structural, and evolutionary properties of proteins, the applications are limited by the complication of the features and the availability of protein structural information. Another limitation is that the prediction results are hard to be interpreted with physicochemical principles and biological knowledge.</p> <p>Results</p> <p>To overcome these limitations, we proposed a novel feature set using physicochemical properties of amino acids, evolutionary profiles of proteins, and protein sequence information. We applied the support vector machine and the random forest with the feature set to experimental amino acid substitutions occurring in the <it>E. coli </it>lac repressor and the bacteriophage T4 lysozyme, as well as to annotated amino acid substitutions occurring in a wide range of human proteins. The results showed that the proposed feature set was superior to the existing ones. To explore physicochemical principles behind amino acid substitutions, we designed a simulated annealing bump hunting strategy to automatically extract interpretable rules for amino acid substitutions. We applied the strategy to annotated human amino acid substitutions and successfully extracted several rules which were either consistent with current biological knowledge or providing new insights for the understanding of amino acid substitutions. When applied to unclassified data, these rules could cover a large portion of samples, and most of the covered samples showed good agreement with predictions made by either the support vector machine or the random forest.</p> <p>Conclusion</p> <p>The prediction methods using the proposed feature set can achieve larger AUC (the area under the ROC curve), smaller BER (the balanced error rate), and larger MCC (the Matthews' correlation coefficient) than those using the published feature sets, suggesting that our feature set is superior to the existing ones. The rules extracted by the simulated annealing bump hunting strategy have comparable coverage and accuracy but much better interpretability as those extracted by the patient rule induction method (PRIM), revealing that the strategy is more effective in inducing interpretable rules.</p> |
url |
http://www.biomedcentral.com/1471-2105/7/417 |
work_keys_str_mv |
AT chenting searchingforinterpretablerulesfordiseasemutationsasimulatedannealingbumphuntingstrategy AT sunfengzhu searchingforinterpretablerulesfordiseasemutationsasimulatedannealingbumphuntingstrategy AT yanghua searchingforinterpretablerulesfordiseasemutationsasimulatedannealingbumphuntingstrategy AT jiangrui searchingforinterpretablerulesfordiseasemutationsasimulatedannealingbumphuntingstrategy |
_version_ |
1725379268693721088 |