Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods

碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 98 === A mutation may change the stability of a protein structure, which is an extremely important issue in the study of protein structure. An accurate prediction of protein stability free energy change (ΔΔG) helps the protein design process and provides a more reliabl...

Full description

Bibliographic Details
Main Authors: Gan-Lin Chen, 陳甘霖
Other Authors: Eric Y. T. Juan
Format: Others
Language:zh-TW
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/62324447373165856494
id ndltd-TW-098NTOU5394004
record_format oai_dc
spelling ndltd-TW-098NTOU53940042015-10-13T19:35:32Z http://ndltd.ncl.edu.tw/handle/62324447373165856494 Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods 使用機器學習的方法預測突變蛋白質穩定自由能量變化 Gan-Lin Chen 陳甘霖 碩士 國立臺灣海洋大學 資訊工程學系 98 A mutation may change the stability of a protein structure, which is an extremely important issue in the study of protein structure. An accurate prediction of protein stability free energy change (ΔΔG) helps the protein design process and provides a more reliable reference for the study of protein structure. This work uses machine learning methods to predict ΔΔG starting from the protein sequence and experimental mutation thermodynamic data sets. This work uses four methods to convert a protein sequence into a feature vector, and a number of machine learning algorithms such as Decision Trees, Support Vector Machines, Nearest Neighbors, Random Forests, etc. Five datasets adopted from the ProTherm database includes four datasets (SEQDB, NewDB982, NewDB667 and NewDB1313) for a single point mutation and another dataset (DM180) for a double point mutation. The methods used in this work can compete with state-of-the-art systems on the prediction accuracy. For the prediction of single point mutation, ΔΔG is discriminated between 3 classes: destabilizing, neutral and stabilizing mutation. Using 20-fold cross-validation on the SEQDB dataset, an M-AAwindow-based Random Forests classifier achieves an overall accuracy of 73% and a mean value correlation coefficient (MCC) of 0.53. An M-AAwindow-based Random Forests classifier is tested on these datasets (NewDB982, NewDB667 and NewDB1313), with an overall accuracy of 59% , 64% and 64% , respectively. For the prediction of a double point mutation, ΔΔG is discriminated between 2 classes: destabilizing and stabilizing mutation. ΔΔG is discriminated between 12 classes by two models based on C4.5 decision trees for the first point mutation and the second point mutation, respectively. Furthermore, A K-Nearest Neighbors classifier makes a prediction by combining the outcome of individual models for discriminating between destabilizing mutation and stabilizing mutation, with an overall accuracy of 83.3%. The experimental results of a single point mutation and a double point mutation showed that the classifiers based on M-AAwindow have better performance for the prediction of ΔΔG. Eric Y. T. Juan 阮議聰 2010 學位論文 ; thesis 85 zh-TW
collection NDLTD
language zh-TW
format Others
sources NDLTD
description 碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 98 === A mutation may change the stability of a protein structure, which is an extremely important issue in the study of protein structure. An accurate prediction of protein stability free energy change (ΔΔG) helps the protein design process and provides a more reliable reference for the study of protein structure. This work uses machine learning methods to predict ΔΔG starting from the protein sequence and experimental mutation thermodynamic data sets. This work uses four methods to convert a protein sequence into a feature vector, and a number of machine learning algorithms such as Decision Trees, Support Vector Machines, Nearest Neighbors, Random Forests, etc. Five datasets adopted from the ProTherm database includes four datasets (SEQDB, NewDB982, NewDB667 and NewDB1313) for a single point mutation and another dataset (DM180) for a double point mutation. The methods used in this work can compete with state-of-the-art systems on the prediction accuracy. For the prediction of single point mutation, ΔΔG is discriminated between 3 classes: destabilizing, neutral and stabilizing mutation. Using 20-fold cross-validation on the SEQDB dataset, an M-AAwindow-based Random Forests classifier achieves an overall accuracy of 73% and a mean value correlation coefficient (MCC) of 0.53. An M-AAwindow-based Random Forests classifier is tested on these datasets (NewDB982, NewDB667 and NewDB1313), with an overall accuracy of 59% , 64% and 64% , respectively. For the prediction of a double point mutation, ΔΔG is discriminated between 2 classes: destabilizing and stabilizing mutation. ΔΔG is discriminated between 12 classes by two models based on C4.5 decision trees for the first point mutation and the second point mutation, respectively. Furthermore, A K-Nearest Neighbors classifier makes a prediction by combining the outcome of individual models for discriminating between destabilizing mutation and stabilizing mutation, with an overall accuracy of 83.3%. The experimental results of a single point mutation and a double point mutation showed that the classifiers based on M-AAwindow have better performance for the prediction of ΔΔG.
author2 Eric Y. T. Juan
author_facet Eric Y. T. Juan
Gan-Lin Chen
陳甘霖
author Gan-Lin Chen
陳甘霖
spellingShingle Gan-Lin Chen
陳甘霖
Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods
author_sort Gan-Lin Chen
title Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods
title_short Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods
title_full Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods
title_fullStr Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods
title_full_unstemmed Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods
title_sort predicting protein stability free energy change upon mutations using machine learning methods
publishDate 2010
url http://ndltd.ncl.edu.tw/handle/62324447373165856494
work_keys_str_mv AT ganlinchen predictingproteinstabilityfreeenergychangeuponmutationsusingmachinelearningmethods
AT chéngānlín predictingproteinstabilityfreeenergychangeuponmutationsusingmachinelearningmethods
AT ganlinchen shǐyòngjīqìxuéxídefāngfǎyùcètūbiàndànbáizhìwěndìngzìyóunéngliàngbiànhuà
AT chéngānlín shǐyòngjīqìxuéxídefāngfǎyùcètūbiàndànbáizhìwěndìngzìyóunéngliàngbiànhuà
_version_ 1718042226656804864