Clustering and Characterizing Local Protein Structures by an Expectation-Maximization (EM)-Assisted Approach

碩士 === 國立臺灣大學 === 醫學工程學研究所 === 90 === It is important to understand the biochemical functions within the body by studying the protein structure. In predicting the protein structure, one approach is to predict local conformations of a protein and then reassemble the substructures. Previous studies ha...

Full description

Bibliographic Details
Main Authors: Ta-tsen Soong, 宋大辰
Other Authors: Chung-ming Chen
Format: Others
Language:en_US
Published: 2002
Online Access:http://ndltd.ncl.edu.tw/handle/91604425775207861302
id ndltd-TW-090NTU00530021
record_format oai_dc
spelling ndltd-TW-090NTU005300212015-10-13T14:41:12Z http://ndltd.ncl.edu.tw/handle/91604425775207861302 Clustering and Characterizing Local Protein Structures by an Expectation-Maximization (EM)-Assisted Approach 蛋白質局部重複性結構之分析-以EM為輔助之群聚演算法 Ta-tsen Soong 宋大辰 碩士 國立臺灣大學 醫學工程學研究所 90 It is important to understand the biochemical functions within the body by studying the protein structure. In predicting the protein structure, one approach is to predict local conformations of a protein and then reassemble the substructures. Previous studies have demonstrated that using a small set of local structures can help reconstruct or build a protein molecule with high precision. Our study follows the same framework and proposes a method for finding recurrent local structures of proteins. The algorithm starts by applying Expectation-Maximization (EM) clustering to the distance matrices of pentamer fragment structures. A rough partition of the conformation space can thus be derived. Then by subjecting the EM clusters to the split-and-merge algorithm in the second stage, we can obtain a finite number of clusters and guarantee the homogeneity and distinctiveness of each one (i.e. each cluster consists of very similar structures and is different from other clusters). The results show that, with 41 major representative structures, we can approximate a test set of protein fragments with an error of 0.378 Å. With only 20 types of structures, the test set can be modeled at 0.44 Å, which is comparable to the performance of a previous method (i.e. the oligons [24]). This study also compiled a position-specific frequency map for each of the clusters. The frequency maps will help discover the sequence-structure relationship in future studies. Chung-ming Chen Ming-jing Hwang 陳中明 黃明經 2002 學位論文 ; thesis 84 en_US
collection NDLTD
language en_US
format Others
sources NDLTD
description 碩士 === 國立臺灣大學 === 醫學工程學研究所 === 90 === It is important to understand the biochemical functions within the body by studying the protein structure. In predicting the protein structure, one approach is to predict local conformations of a protein and then reassemble the substructures. Previous studies have demonstrated that using a small set of local structures can help reconstruct or build a protein molecule with high precision. Our study follows the same framework and proposes a method for finding recurrent local structures of proteins. The algorithm starts by applying Expectation-Maximization (EM) clustering to the distance matrices of pentamer fragment structures. A rough partition of the conformation space can thus be derived. Then by subjecting the EM clusters to the split-and-merge algorithm in the second stage, we can obtain a finite number of clusters and guarantee the homogeneity and distinctiveness of each one (i.e. each cluster consists of very similar structures and is different from other clusters). The results show that, with 41 major representative structures, we can approximate a test set of protein fragments with an error of 0.378 Å. With only 20 types of structures, the test set can be modeled at 0.44 Å, which is comparable to the performance of a previous method (i.e. the oligons [24]). This study also compiled a position-specific frequency map for each of the clusters. The frequency maps will help discover the sequence-structure relationship in future studies.
author2 Chung-ming Chen
author_facet Chung-ming Chen
Ta-tsen Soong
宋大辰
author Ta-tsen Soong
宋大辰
spellingShingle Ta-tsen Soong
宋大辰
Clustering and Characterizing Local Protein Structures by an Expectation-Maximization (EM)-Assisted Approach
author_sort Ta-tsen Soong
title Clustering and Characterizing Local Protein Structures by an Expectation-Maximization (EM)-Assisted Approach
title_short Clustering and Characterizing Local Protein Structures by an Expectation-Maximization (EM)-Assisted Approach
title_full Clustering and Characterizing Local Protein Structures by an Expectation-Maximization (EM)-Assisted Approach
title_fullStr Clustering and Characterizing Local Protein Structures by an Expectation-Maximization (EM)-Assisted Approach
title_full_unstemmed Clustering and Characterizing Local Protein Structures by an Expectation-Maximization (EM)-Assisted Approach
title_sort clustering and characterizing local protein structures by an expectation-maximization (em)-assisted approach
publishDate 2002
url http://ndltd.ncl.edu.tw/handle/91604425775207861302
work_keys_str_mv AT tatsensoong clusteringandcharacterizinglocalproteinstructuresbyanexpectationmaximizationemassistedapproach
AT sòngdàchén clusteringandcharacterizinglocalproteinstructuresbyanexpectationmaximizationemassistedapproach
AT tatsensoong dànbáizhìjúbùzhòngfùxìngjiégòuzhīfēnxīyǐemwèifǔzhùzhīqúnjùyǎnsuànfǎ
AT sòngdàchén dànbáizhìjúbùzhòngfùxìngjiégòuzhīfēnxīyǐemwèifǔzhùzhīqúnjùyǎnsuànfǎ
_version_ 1717755921782800384