Summary: | 碩士 === 國立臺灣大學 === 資訊工程學研究所 === 91 === In this study, we proposed a novel approach to effectively predict the protein disulfide connectivity pattern directly from its amino acid sequence. In protein structure prediction, the conformation space is extremely large. Constraints such as secondary structure information and solvent accessibility of residues were applied to reduce the search space and the prediction accuracy could thus be improved. Disulfide bonds, the covalent linkages between two cysteines, are commonly found in extracellular proteins. The correct prediction of disulfide connectivity can strongly reduce the conformation space and may also be useful in predicting protein tertiary structure.
Two steps were combined in our approach: 1) Trained a model to predict the bond potential for all pairs of cysteines from the training set; 2) For a given protein, the predicted bond potential was adopted to find the most possible disulfide connectivity pattern. In step 1 each pair of cysteines in the training set, whether formed disulfide bond or not, were fed into the Support Vector Machine to train the bond potential predictor. In step 2, for a target sequence, a weighted complete graph was constructed in which cysteines and the corresponding bond potentials were represented by vertices and the weights of edges, respectively. The Edmonds’ algorithm was applied to find the perfect matching with the maximal weight. According to the matching, a disulfide connectivity pattern was successfully obtained. A four-fold cross-validation procedure on a data set containing 452 proteins was performed in this study to validate the proposed approach. As a result, the proposed approach has an overall accuracy of 44.53%, which is better than that of previous works. In summary, the proposed method is promising to locate the disulfide bridges in proteins
|