Summary: | Inferring Genetic Regulatory Networks (GRN) from multiple data sources is a fundamental problem in computational biology. Computational models for GRN range from simple Boolean networks to stochastic differential equations. To successfully model GRN, a computational method has to be scalable and capable of integrating different biological data sources effectively and homogeneously. In this thesis, we introduce a novel method to model GRN using Cost-Based Abduction (CBA) and study the relation between CBA and Bayesian inference. CBA is an important AI formalism for reasoning under uncertainty that can integrate different biological data sources effectively. We use three different yeast genome data sources—protein-DNA, protein-protein, and knock-out data—to build a skeleton (unannotated) graph which acts as a theory to build a CBA system. The Least Cost Proof (LCP) for the CBA system fully annotates the skeleton graph to represent the learned GRN. Our results show that CBA is a promising tool in computational biology in general and in GRN modeling in particular because CBA knowledge representation can intrinsically implement the AND/OR logic in GRN while enforcing cis-regulatory logic constraints effectively, allowing the method to operate on a genome-wide scale.Besides allowing us to successfully learn yeast pathways such as the pheromone pathway, our method is scalable enough to analyze the full yeast genome in a single CBA instance, without sub-networking. The scalability power of our method comes from the fact that our CBA model size grows in a quadratic, rather than exponential, manner with respect to data size and path length. We also introduce a new algorithm to convert CBA into an equivalent binary linear program that computes the exact LCP for the CBA system, thus reaching the optimal solution. Our work establishes a framework to solve Bayesian networks using integer linear programming and high order recurrent neural networks through CBA as an intermediate representation.
|