Machine learning methods for prediction of disulphide bonding states of cysteine residues in proteins
The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction...
Main Author: | |
---|---|
Other Authors: | |
Format: | Doctoral Thesis |
Language: | en |
Published: |
Alma Mater Studiorum - Università di Bologna
2010
|
Subjects: | |
Online Access: | http://amsdottorato.unibo.it/2588/ |
id |
ndltd-unibo.it-oai-amsdottorato.cib.unibo.it-2588 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-unibo.it-oai-amsdottorato.cib.unibo.it-25882014-03-24T16:28:34Z Machine learning methods for prediction of disulphide bonding states of cysteine residues in proteins Shukla, Priyank <1984> INF/01 Informatica The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding. We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable. Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it. Alma Mater Studiorum - Università di Bologna Casadio, Rita 2010-05-05 Doctoral Thesis PeerReviewed application/pdf en http://amsdottorato.unibo.it/2588/ info:eu-repo/semantics/openAccess |
collection |
NDLTD |
language |
en |
format |
Doctoral Thesis |
sources |
NDLTD |
topic |
INF/01 Informatica |
spellingShingle |
INF/01 Informatica Shukla, Priyank <1984> Machine learning methods for prediction of disulphide bonding states of cysteine residues in proteins |
description |
The goal of this thesis work is to develop a computational method based on machine learning techniques for predicting disulfide-bonding states of cysteine residues in proteins, which is a sub-problem of a bigger and yet unsolved problem of protein structure prediction. Improvement in the prediction of disulfide bonding states of cysteine residues will help in putting a constraint in the three dimensional (3D) space of the respective protein structure, and thus will eventually help in the prediction of 3D structure of proteins. Results of this work will have direct implications in site-directed mutational studies of proteins, proteins engineering and the problem of protein folding.
We have used a combination of Artificial Neural Network (ANN) and Hidden Markov Model (HMM), the so-called Hidden Neural Network (HNN) as a machine learning technique to develop our prediction method. By using different global and local features of proteins (specifically profiles, parity of cysteine residues, average cysteine conservation, correlated mutation, sub-cellular localization, and signal peptide) as inputs and considering Eukaryotes and Prokaryotes separately we have reached to a remarkable accuracy of 94% on cysteine basis for both Eukaryotic and Prokaryotic datasets, and an accuracy of 90% and 93% on protein basis for Eukaryotic dataset and Prokaryotic dataset respectively. These accuracies are best so far ever reached by any existing prediction methods, and thus our prediction method has outperformed all the previously developed approaches and therefore is more reliable.
Most interesting part of this thesis work is the differences in the prediction performances of Eukaryotes and Prokaryotes at the basic level of input coding when ‘profile’ information was given as input to our prediction method. And one of the reasons for this we discover is the difference in the amino acid composition of the local environment of bonded and free cysteine residues in Eukaryotes and Prokaryotes. Eukaryotic bonded cysteine examples have a ‘symmetric-cysteine-rich’ environment, where as Prokaryotic bonded examples lack it. |
author2 |
Casadio, Rita |
author_facet |
Casadio, Rita Shukla, Priyank <1984> |
author |
Shukla, Priyank <1984> |
author_sort |
Shukla, Priyank <1984> |
title |
Machine learning methods for prediction of disulphide bonding states of cysteine residues in proteins |
title_short |
Machine learning methods for prediction of disulphide bonding states of cysteine residues in proteins |
title_full |
Machine learning methods for prediction of disulphide bonding states of cysteine residues in proteins |
title_fullStr |
Machine learning methods for prediction of disulphide bonding states of cysteine residues in proteins |
title_full_unstemmed |
Machine learning methods for prediction of disulphide bonding states of cysteine residues in proteins |
title_sort |
machine learning methods for prediction of disulphide bonding states of cysteine residues in proteins |
publisher |
Alma Mater Studiorum - Università di Bologna |
publishDate |
2010 |
url |
http://amsdottorato.unibo.it/2588/ |
work_keys_str_mv |
AT shuklapriyank1984 machinelearningmethodsforpredictionofdisulphidebondingstatesofcysteineresiduesinproteins |
_version_ |
1716654132759101440 |