Clustering Analysis of Protein Phosphorylation Sequence by GHSOM

碩士 === 長庚大學 === 資訊管理學系 === 98 === In the post-genome era, research topics related to proteomics become more important. Proteins participate in almost all of the physiological and pathological processes in human beings. Phosphorylation of protein is a process caused by protein kinases which will chan...

Full description

Bibliographic Details
Main Authors: Kuo Sheng Lien, 連國盛
Other Authors: C. H. Chen
Format: Others
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/47624453411019236062
Description
Summary:碩士 === 長庚大學 === 資訊管理學系 === 98 === In the post-genome era, research topics related to proteomics become more important. Proteins participate in almost all of the physiological and pathological processes in human beings. Phosphorylation of protein is a process caused by protein kinases which will change the structure and the capability of proteins. Many researchers use data mining technology to predict protein phosphorylation sites by applying a classification model. This study uses clustering method to the analysis of the protein phosphorylation sequences that regulated by particular kinase. Because protein phosphorylation sequences are categorical data, this study used the physical-chemical properties of amino acids, binary coding and unary coding to encode the protein phosphorylation sequences in order to transform them into numeric data for traditional clustering methods to work. This study applies GHSOM to cluster the protein phosphorylation sequences regulated by kinase CK2_group and PKA_group. This study used two kinds of color representations for amino acids to evaluate the performance of the encoding methods and the performance of different parameters in GHSOM. The results show that the encoding method using the physical-chemical properties of amino acids performs best.