Applying Structural Domain Information for Enzyme Reaction Annotation and Protein-Protein Interaction Inference

博士 === 國立清華大學 === 資訊工程學系 === 102 === Domains are fundamental building blocks of proteins which perform a variety of functions within living organisms, including catalysis, signal transduction, and transport of nutrients. The majority of proteins are composed of more than two domains that recognize a...

Full description

Bibliographic Details
Main Authors: Huang, Chuan-Ching, 黃筌敬
Other Authors: Tang, Chuan Yi
Format: Others
Language:en_US
Published: 2013
Online Access:http://ndltd.ncl.edu.tw/handle/59906111277978423707
Description
Summary:博士 === 國立清華大學 === 資訊工程學系 === 102 === Domains are fundamental building blocks of proteins which perform a variety of functions within living organisms, including catalysis, signal transduction, and transport of nutrients. The majority of proteins are composed of more than two domains that recognize and bind structural units in other proteins through protein-protein interactions. This dissertation uses the nature of domains in the proteins to investigate two main topics including “enzyme reaction prediction based on the domain architecture of an enzyme” and “inferring protein-protein interactions (PPIs) from domain-domain interactions (DDIs)”. The gap between novel protein sequences and characterized protein functions has been widened according to the advent of high-throughput genome sequencing techniques in the post-genomics era. To identify functions of a protein from manually curated sequence annotation is a challenging task; therefore, automated protein function prediction techniques are necessary. The enzyme nomenclature proposed by the International Union of Biochemistry and Molecular Biology has provided a well-defined four-field number on enzyme classification. The first three numbers of an enzyme reaction describe the overtype of enzymatic reaction, and the last number denotes the substrate specificity of a reaction. Proteins are grouped into two data sets, comprising the 3-numerical-block set and the 4-numerical-block set. According to whether the protein performed more than one enzymatic reaction, each data set was further divided into single-EC cases and multiple-EC cases. For the case of single-EC, the fractions of entries correctly classified using the well-known association rule method reached 96% and 91% accuracy for the 3-numerical-block set and the 4-numerical-block set, respectively. The proposed enzyme reaction prediction (ERP) method showed marginally higher accuracy, with 99% and 92% separately. It is more difficult to predict multiple enzymatic activities for a single protein. For the case of multiple-EC, the fractions of entries correctly predicted for the 3-numerical-block set and the 4-numerical-block set were 17% and 8%, respectively, for the association rule method, and 49% and 42%, respectively, for the ERP method. Biological processes could be carried out when one protein recognize and bind certain structural elements in other proteins through PPIs. Therefore, it is possible to explore protein functions from protein interactions at domain level. Noroviruses cause severe gastroenteritis and foodborne illness during the winter worldwide. There is no efficient vaccine for Noroviruses because of their variable genome sequences. Vulnerable populations suffer from Noroviruses often require hospitalization and may die. We attempted to build the protein interaction network from the domain level for clinical applications and drug design further.