Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy

<p>Abstract</p> <p>Background</p> <p>Understanding how amino acid substitutions affect protein functions is critical for the study of proteins and their implications in diseases. Although methods have been developed for predicting potential effects of amino acid substit...

Full description

Bibliographic Details
Main Authors: Chen Ting, Sun Fengzhu, Yang Hua, Jiang Rui
Format: Article
Language:English
Published: BMC 2006-09-01
Series:BMC Bioinformatics
Online Access:http://www.biomedcentral.com/1471-2105/7/417
id doaj-ab50c8571ebc41458dbd0a03220c8f4b
record_format Article
spelling doaj-ab50c8571ebc41458dbd0a03220c8f4b2020-11-25T00:17:33ZengBMCBMC Bioinformatics1471-21052006-09-017141710.1186/1471-2105-7-417Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategyChen TingSun FengzhuYang HuaJiang Rui<p>Abstract</p> <p>Background</p> <p>Understanding how amino acid substitutions affect protein functions is critical for the study of proteins and their implications in diseases. Although methods have been developed for predicting potential effects of amino acid substitutions using sequence, three-dimensional structural, and evolutionary properties of proteins, the applications are limited by the complication of the features and the availability of protein structural information. Another limitation is that the prediction results are hard to be interpreted with physicochemical principles and biological knowledge.</p> <p>Results</p> <p>To overcome these limitations, we proposed a novel feature set using physicochemical properties of amino acids, evolutionary profiles of proteins, and protein sequence information. We applied the support vector machine and the random forest with the feature set to experimental amino acid substitutions occurring in the <it>E. coli </it>lac repressor and the bacteriophage T4 lysozyme, as well as to annotated amino acid substitutions occurring in a wide range of human proteins. The results showed that the proposed feature set was superior to the existing ones. To explore physicochemical principles behind amino acid substitutions, we designed a simulated annealing bump hunting strategy to automatically extract interpretable rules for amino acid substitutions. We applied the strategy to annotated human amino acid substitutions and successfully extracted several rules which were either consistent with current biological knowledge or providing new insights for the understanding of amino acid substitutions. When applied to unclassified data, these rules could cover a large portion of samples, and most of the covered samples showed good agreement with predictions made by either the support vector machine or the random forest.</p> <p>Conclusion</p> <p>The prediction methods using the proposed feature set can achieve larger AUC (the area under the ROC curve), smaller BER (the balanced error rate), and larger MCC (the Matthews' correlation coefficient) than those using the published feature sets, suggesting that our feature set is superior to the existing ones. The rules extracted by the simulated annealing bump hunting strategy have comparable coverage and accuracy but much better interpretability as those extracted by the patient rule induction method (PRIM), revealing that the strategy is more effective in inducing interpretable rules.</p> http://www.biomedcentral.com/1471-2105/7/417
collection DOAJ
language English
format Article
sources DOAJ
author Chen Ting
Sun Fengzhu
Yang Hua
Jiang Rui
spellingShingle Chen Ting
Sun Fengzhu
Yang Hua
Jiang Rui
Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy
BMC Bioinformatics
author_facet Chen Ting
Sun Fengzhu
Yang Hua
Jiang Rui
author_sort Chen Ting
title Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy
title_short Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy
title_full Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy
title_fullStr Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy
title_full_unstemmed Searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy
title_sort searching for interpretable rules for disease mutations: a simulated annealing bump hunting strategy
publisher BMC
series BMC Bioinformatics
issn 1471-2105
publishDate 2006-09-01
description <p>Abstract</p> <p>Background</p> <p>Understanding how amino acid substitutions affect protein functions is critical for the study of proteins and their implications in diseases. Although methods have been developed for predicting potential effects of amino acid substitutions using sequence, three-dimensional structural, and evolutionary properties of proteins, the applications are limited by the complication of the features and the availability of protein structural information. Another limitation is that the prediction results are hard to be interpreted with physicochemical principles and biological knowledge.</p> <p>Results</p> <p>To overcome these limitations, we proposed a novel feature set using physicochemical properties of amino acids, evolutionary profiles of proteins, and protein sequence information. We applied the support vector machine and the random forest with the feature set to experimental amino acid substitutions occurring in the <it>E. coli </it>lac repressor and the bacteriophage T4 lysozyme, as well as to annotated amino acid substitutions occurring in a wide range of human proteins. The results showed that the proposed feature set was superior to the existing ones. To explore physicochemical principles behind amino acid substitutions, we designed a simulated annealing bump hunting strategy to automatically extract interpretable rules for amino acid substitutions. We applied the strategy to annotated human amino acid substitutions and successfully extracted several rules which were either consistent with current biological knowledge or providing new insights for the understanding of amino acid substitutions. When applied to unclassified data, these rules could cover a large portion of samples, and most of the covered samples showed good agreement with predictions made by either the support vector machine or the random forest.</p> <p>Conclusion</p> <p>The prediction methods using the proposed feature set can achieve larger AUC (the area under the ROC curve), smaller BER (the balanced error rate), and larger MCC (the Matthews' correlation coefficient) than those using the published feature sets, suggesting that our feature set is superior to the existing ones. The rules extracted by the simulated annealing bump hunting strategy have comparable coverage and accuracy but much better interpretability as those extracted by the patient rule induction method (PRIM), revealing that the strategy is more effective in inducing interpretable rules.</p>
url http://www.biomedcentral.com/1471-2105/7/417
work_keys_str_mv AT chenting searchingforinterpretablerulesfordiseasemutationsasimulatedannealingbumphuntingstrategy
AT sunfengzhu searchingforinterpretablerulesfordiseasemutationsasimulatedannealingbumphuntingstrategy
AT yanghua searchingforinterpretablerulesfordiseasemutationsasimulatedannealingbumphuntingstrategy
AT jiangrui searchingforinterpretablerulesfordiseasemutationsasimulatedannealingbumphuntingstrategy
_version_ 1725379268693721088