A factor analysis approach to transcription regulatory network reconstruction using gene expression data

Reconstruction of Transcription Regulatory Network (TRN) and Transcription Factor Activity (TFA) from gene expression data is an important problem in systems biology. Currently, there exist various factor analysis methods for TRN reconstruction, but most approaches have specific assumptions not sat...

Full description

Bibliographic Details
Main Authors: Chen, Wei, 陈玮
Other Authors: Hung, YS
Language:English
Published: The University of Hong Kong (Pokfulam, Hong Kong) 2013
Subjects:
Online Access:http://hdl.handle.net/10722/180958
Description
Summary:Reconstruction of Transcription Regulatory Network (TRN) and Transcription Factor Activity (TFA) from gene expression data is an important problem in systems biology. Currently, there exist various factor analysis methods for TRN reconstruction, but most approaches have specific assumptions not satisfied by real biological data. Network Component Analysis (NCA) can handle such limitations and is considered to be one of the most effective methods. The prerequisite for NCA is knowledge of the structure of TRN. Such structure can be obtained from ChIP-chip or ChIP-seq experiments, which however have quite limited applications. In order to cope with the difficulty, we resort to heuristic optimization algorithm such as Particle Swarm Optimization (PSO), in order to explore the possible structures of TRN and choose the most plausible one. Regarding the structure estimation problem, we extend classical PSO and propose a novel Probabilistic binary PSO. Furthermore, an improved NCA called FastNCA is adopted to compute the objective function accurately and fast, which enables PSO to run efficiently. Since heuristic optimization cannot guarantee global convergence, we run PSO multiple times and integrate the results. Then GCV-LASSO (Generalized Cross Validation - Least Absolute Shrinkage and Selection Operator) is performed to estimate TRN. We apply our approach and other factor analysis methods on the synthetic data. The results indicate that the proposed PSOFastNCA-GCV-LASSO algorithm gives better estimation. In order to incorporate more prior information on TRN structure and gene expression dynamics in the linear factor analysis model for improved estimation of TRN and TFAs, a linear Bayesian framework is adopted. Under the unified Bayesian framework, Bayesian Linear Sparse Factor Analysis Model (BLSFM) and Bayesian Linear State Space Model (BLSSM) are developed for instantaneous and dynamic TRN, respectively. Various approaches to incorporate partial and ambiguous prior network structure information in the Bayesian framework are proposed to improve performance in practical applications. Furthermore, we propose a novel mechanism for estimating the hyper-parameters of the distribution priors in our BLSFM and BLSSM models, which can significantly improve the estimation compared to traditional ways of hyper-parameter setting. With this development, reasonably good estimation of TFAs and TRN can be obtained even without use of any structure prior of TRN. Extensive numerical experiments are performed to investigate our developed methods under various settings, with comparison to some existing alternative approaches. It is demonstrated that our hyper-parameter estimation method improves the estimation of TFA and TRN in most settings and has superior performance, and that structure priors in general leads to improved estimation performance. Regarding application to real biological data, we execute the PSO-FastNCAGCV-LASSO algorithm developed in the thesis using E. Coli microarray data and obtain sensible estimation of TFAs and TRN. We apply BLSFM without structure priors of TRN, BLSSM without structure priors as well as with partial structure priors to Yeast S. cerevisiae microarray data and obtain a reasonable estimation of TFAs and TRN. === published_or_final_version === Electrical and Electronic Engineering === Doctoral === Doctor of Philosophy