Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods
碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 98 === A mutation may change the stability of a protein structure, which is an extremely important issue in the study of protein structure. An accurate prediction of protein stability free energy change (ΔΔG) helps the protein design process and provides a more reliabl...
Main Authors: | , |
---|---|
Other Authors: | |
Format: | Others |
Language: | zh-TW |
Published: |
2010
|
Online Access: | http://ndltd.ncl.edu.tw/handle/62324447373165856494 |
id |
ndltd-TW-098NTOU5394004 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-TW-098NTOU53940042015-10-13T19:35:32Z http://ndltd.ncl.edu.tw/handle/62324447373165856494 Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods 使用機器學習的方法預測突變蛋白質穩定自由能量變化 Gan-Lin Chen 陳甘霖 碩士 國立臺灣海洋大學 資訊工程學系 98 A mutation may change the stability of a protein structure, which is an extremely important issue in the study of protein structure. An accurate prediction of protein stability free energy change (ΔΔG) helps the protein design process and provides a more reliable reference for the study of protein structure. This work uses machine learning methods to predict ΔΔG starting from the protein sequence and experimental mutation thermodynamic data sets. This work uses four methods to convert a protein sequence into a feature vector, and a number of machine learning algorithms such as Decision Trees, Support Vector Machines, Nearest Neighbors, Random Forests, etc. Five datasets adopted from the ProTherm database includes four datasets (SEQDB, NewDB982, NewDB667 and NewDB1313) for a single point mutation and another dataset (DM180) for a double point mutation. The methods used in this work can compete with state-of-the-art systems on the prediction accuracy. For the prediction of single point mutation, ΔΔG is discriminated between 3 classes: destabilizing, neutral and stabilizing mutation. Using 20-fold cross-validation on the SEQDB dataset, an M-AAwindow-based Random Forests classifier achieves an overall accuracy of 73% and a mean value correlation coefficient (MCC) of 0.53. An M-AAwindow-based Random Forests classifier is tested on these datasets (NewDB982, NewDB667 and NewDB1313), with an overall accuracy of 59% , 64% and 64% , respectively. For the prediction of a double point mutation, ΔΔG is discriminated between 2 classes: destabilizing and stabilizing mutation. ΔΔG is discriminated between 12 classes by two models based on C4.5 decision trees for the first point mutation and the second point mutation, respectively. Furthermore, A K-Nearest Neighbors classifier makes a prediction by combining the outcome of individual models for discriminating between destabilizing mutation and stabilizing mutation, with an overall accuracy of 83.3%. The experimental results of a single point mutation and a double point mutation showed that the classifiers based on M-AAwindow have better performance for the prediction of ΔΔG. Eric Y. T. Juan 阮議聰 2010 學位論文 ; thesis 85 zh-TW |
collection |
NDLTD |
language |
zh-TW |
format |
Others
|
sources |
NDLTD |
description |
碩士 === 國立臺灣海洋大學 === 資訊工程學系 === 98 === A mutation may change the stability of a protein structure, which is an extremely important issue in the study of protein structure. An accurate prediction of protein stability free energy change (ΔΔG) helps the protein design process and provides a more reliable reference for the study of protein structure.
This work uses machine learning methods to predict ΔΔG starting from the protein sequence and experimental mutation thermodynamic data sets. This work uses four methods to convert a protein sequence into a feature vector, and a number of machine learning algorithms such as Decision Trees, Support Vector Machines, Nearest Neighbors, Random Forests, etc. Five datasets adopted from the ProTherm database includes four datasets (SEQDB, NewDB982, NewDB667 and NewDB1313) for a single point mutation and another dataset (DM180) for a double point mutation.
The methods used in this work can compete with state-of-the-art systems on the prediction accuracy. For the prediction of single point mutation, ΔΔG is discriminated between 3 classes: destabilizing, neutral and stabilizing mutation. Using 20-fold cross-validation on the SEQDB dataset, an M-AAwindow-based Random Forests classifier achieves an overall accuracy of 73% and a mean value correlation coefficient (MCC) of 0.53. An M-AAwindow-based Random Forests classifier is tested on these datasets (NewDB982, NewDB667 and NewDB1313), with an overall accuracy of 59% , 64% and 64% , respectively.
For the prediction of a double point mutation, ΔΔG is discriminated between 2 classes: destabilizing and stabilizing mutation. ΔΔG is discriminated between 12 classes by two models based on C4.5 decision trees for the first point mutation and the second point mutation, respectively. Furthermore, A K-Nearest Neighbors classifier makes a prediction by combining the outcome of individual models for discriminating between destabilizing mutation and stabilizing mutation, with an overall accuracy of 83.3%. The experimental results of a single point mutation and a double point mutation showed that the classifiers based on M-AAwindow have better performance for the prediction of ΔΔG.
|
author2 |
Eric Y. T. Juan |
author_facet |
Eric Y. T. Juan Gan-Lin Chen 陳甘霖 |
author |
Gan-Lin Chen 陳甘霖 |
spellingShingle |
Gan-Lin Chen 陳甘霖 Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods |
author_sort |
Gan-Lin Chen |
title |
Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods |
title_short |
Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods |
title_full |
Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods |
title_fullStr |
Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods |
title_full_unstemmed |
Predicting Protein Stability Free Energy Change upon Mutations Using Machine Learning Methods |
title_sort |
predicting protein stability free energy change upon mutations using machine learning methods |
publishDate |
2010 |
url |
http://ndltd.ncl.edu.tw/handle/62324447373165856494 |
work_keys_str_mv |
AT ganlinchen predictingproteinstabilityfreeenergychangeuponmutationsusingmachinelearningmethods AT chéngānlín predictingproteinstabilityfreeenergychangeuponmutationsusingmachinelearningmethods AT ganlinchen shǐyòngjīqìxuéxídefāngfǎyùcètūbiàndànbáizhìwěndìngzìyóunéngliàngbiànhuà AT chéngānlín shǐyòngjīqìxuéxídefāngfǎyùcètūbiàndànbáizhìwěndìngzìyóunéngliàngbiànhuà |
_version_ |
1718042226656804864 |