Inferring genetic network based on various types of biological data using machine learning algorithms

博士 === 臺灣大學 === 醫學工程學研究所 === 98 === In the post-genome era, the analysis of high-throughput data has become a critical requirement in many laboratories. Many computational approaches have been developed to identify genetic or transcriptional interactions that may be used to prevent or disable unwant...

Full description

Bibliographic Details
Main Authors: Cheng-Long Chuang, 莊欽龍
Other Authors: Chung-Ming Chen
Format: Others
Language:en_US
Published: 2010
Online Access:http://ndltd.ncl.edu.tw/handle/96486950789473109001
Description
Summary:博士 === 臺灣大學 === 醫學工程學研究所 === 98 === In the post-genome era, the analysis of high-throughput data has become a critical requirement in many laboratories. Many computational approaches have been developed to identify genetic or transcriptional interactions that may be used to prevent or disable unwanted state, such as those associated with oncogenesis or a disease. Therefore, inferring genetic interactions and transcriptional interactions through inspection of high-throughput data are essential issues in post-genomic research. In this study, we developed three computational models to extract the nonlinear relationship between genes, and also construct transcription regulatory networks and genetic regulatory networks with higher accuracy and larger biological significance. The first method is a pattern recognition (called PARE) approach that infers time-lagged genetic interactions from time-course microarray data. A non-linear score extracts some characteristics, the first and second derivatives and the enclosed area, of paired gene-expression curves to approximate the non-linear association and dynamics between the curves. Such non-linear score is then used to identify subclasses of gene pairs with different time lags. Finally, PARE integrates both MGED and existing knowledge via machine learning, and subsequently predicts the other genetic interactions in the subclass. The second method consists of two components, a robust correlation estimator and a nonlinear recurrent model. The method was used to simulate the underlying nonlinear regulatory mechanisms in biological organisms without any prior knowledge. The proposed algorithm was applied to infer the regulatory mechanisms of the general network in Saccharomyces cerevisiae and the pulmonary disease pathways in Homo sapiens with interesting outcomes. The third method is a fuzzy-logic approach, called AdaFuzzy, which integrates DNA sequence, microarray and ChIP-chip data to infer TIs. A robust position weight matrix and a feature vector are proposed in AdaFuzzy to search for consensus sequence motifs. AdaFuzzy was also able to classify all predicted TIs into one or more of the four promoter architectures. The validated success in the prediction results implies that AdaFuzzy can be applied to uncover TIs in yeast.