Clustering and Characterizing Local Protein Structures by an Expectation-Maximization (EM)-Assisted Approach

碩士 === 國立臺灣大學 === 醫學工程學研究所 === 90 === It is important to understand the biochemical functions within the body by studying the protein structure. In predicting the protein structure, one approach is to predict local conformations of a protein and then reassemble the substructures. Previous studies ha...

Full description

Bibliographic Details
Main Authors: Ta-tsen Soong, 宋大辰
Other Authors: Chung-ming Chen
Format: Others
Language:en_US
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/91604425775207861302
Description
Summary:碩士 === 國立臺灣大學 === 醫學工程學研究所 === 90 === It is important to understand the biochemical functions within the body by studying the protein structure. In predicting the protein structure, one approach is to predict local conformations of a protein and then reassemble the substructures. Previous studies have demonstrated that using a small set of local structures can help reconstruct or build a protein molecule with high precision. Our study follows the same framework and proposes a method for finding recurrent local structures of proteins. The algorithm starts by applying Expectation-Maximization (EM) clustering to the distance matrices of pentamer fragment structures. A rough partition of the conformation space can thus be derived. Then by subjecting the EM clusters to the split-and-merge algorithm in the second stage, we can obtain a finite number of clusters and guarantee the homogeneity and distinctiveness of each one (i.e. each cluster consists of very similar structures and is different from other clusters). The results show that, with 41 major representative structures, we can approximate a test set of protein fragments with an error of 0.378 Å. With only 20 types of structures, the test set can be modeled at 0.44 Å, which is comparable to the performance of a previous method (i.e. the oligons [24]). This study also compiled a position-specific frequency map for each of the clusters. The frequency maps will help discover the sequence-structure relationship in future studies.