Causal inference and prior integration in bioinformatics using information theory

An important problem in bioinformatics is the reconstruction of gene regulatory networks from expression data. The analysis of genomic data stemming from high- throughput technologies such as microarray experiments or RNA-sequencing faces several difficulties. The first major issue is the high varia...

Full description

Bibliographic Details
Main Author: Olsen, Catharina
Other Authors: Bontempi, Gianluca
Format: Doctoral Thesis
Language:fr
Published: Universite Libre de Bruxelles 2013
Subjects:
Online Access:http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209401
id ndltd-ulb.ac.be-oai-dipot.ulb.ac.be-2013-209401
record_format oai_dc
spelling ndltd-ulb.ac.be-oai-dipot.ulb.ac.be-2013-2094012018-04-11T17:33:48Z info:eu-repo/semantics/doctoralThesis info:ulb-repo/semantics/doctoralThesis info:ulb-repo/semantics/openurl/vlink-dissertation Causal inference and prior integration in bioinformatics using information theory Olsen, Catharina Bontempi, Gianluca Lenaerts, Tom Meyer, Patrick E. Quackenbush, John Geurts, Pierre Haibe-Kains, Benjamin Jansen, Maarten Universite Libre de Bruxelles Université libre de Bruxelles, Faculté des Sciences – Informatique, Bruxelles 2013-10-17 fr An important problem in bioinformatics is the reconstruction of gene regulatory networks from expression data. The analysis of genomic data stemming from high- throughput technologies such as microarray experiments or RNA-sequencing faces several difficulties. The first major issue is the high variable to sample ratio which is due to a number of factors: a single experiment captures all genes while the number of experiments is restricted by the experiment’s cost, time and patient cohort size. The second problem is that these data sets typically exhibit high amounts of noise.<p><p>Another important problem in bioinformatics is the question of how the inferred networks’ quality can be evaluated. The current best practice is a two step procedure. In the first step, the highest scoring interactions are compared to known interactions stored in biological databases. The inferred networks passes this quality assessment if there is a large overlap with the known interactions. In this case, a second step is carried out in which unknown but high scoring and thus promising new interactions are validated ’by hand’ via laboratory experiments. Unfortunately when integrating prior knowledge in the inference procedure, this validation procedure would be biased by using the same information in both the inference and the validation. Therefore, it would no longer allow an independent validation of the resulting network.<p><p>The main contribution of this thesis is a complete computational framework that uses experimental knock down data in a cross-validation scheme to both infer and validate directed networks. Its components are i) a method that integrates genomic data and prior knowledge to infer directed networks, ii) its implementation in an R/Bioconductor package and iii) a web application to retrieve prior knowledge from PubMed abstracts and biological databases. To infer directed networks from genomic data and prior knowledge, we propose a two step procedure: First, we adapt the pairwise feature selection strategy mRMR to integrate prior knowledge in order to obtain the network’s skeleton. Then for the subsequent orientation phase of the algorithm, we extend a criterion based on interaction information to include prior knowledge. The implementation of this method is available both as part of the prior retrieval tool Predictive Networks and as a stand-alone R/Bioconductor package named predictionet.<p><p>Furthermore, we propose a fully data-driven quantitative validation of such directed networks using experimental knock-down data: We start by identifying the set of genes that was truly affected by the perturbation experiment. The rationale of our validation procedure is that these truly affected genes should also be part of the perturbed gene’s childhood in the inferred network. Consequently, we can compute a performance score Informatique générale Sciences exactes et naturelles Colon (Anatomy) -- Cancer Bioinformatics Information theory Cancer colorectal Bio-informatique Théorie de l'information bioinformatics prior integration causal inference machine learning 1 v. (xvi, 197 p.) Doctorat en Sciences info:eu-repo/semantics/nonPublished local/bictel.ulb.ac.be:ULBetd-10162013-104610 local/ulbcat.ulb.ac.be:994817 http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209401 No full-text files
collection NDLTD
language fr
format Doctoral Thesis
sources NDLTD
topic Informatique générale
Sciences exactes et naturelles
Colon (Anatomy) -- Cancer
Bioinformatics
Information theory
Cancer colorectal
Bio-informatique
Théorie de l'information
bioinformatics
prior integration
causal inference
machine learning
spellingShingle Informatique générale
Sciences exactes et naturelles
Colon (Anatomy) -- Cancer
Bioinformatics
Information theory
Cancer colorectal
Bio-informatique
Théorie de l'information
bioinformatics
prior integration
causal inference
machine learning
Olsen, Catharina
Causal inference and prior integration in bioinformatics using information theory
description An important problem in bioinformatics is the reconstruction of gene regulatory networks from expression data. The analysis of genomic data stemming from high- throughput technologies such as microarray experiments or RNA-sequencing faces several difficulties. The first major issue is the high variable to sample ratio which is due to a number of factors: a single experiment captures all genes while the number of experiments is restricted by the experiment’s cost, time and patient cohort size. The second problem is that these data sets typically exhibit high amounts of noise.<p><p>Another important problem in bioinformatics is the question of how the inferred networks’ quality can be evaluated. The current best practice is a two step procedure. In the first step, the highest scoring interactions are compared to known interactions stored in biological databases. The inferred networks passes this quality assessment if there is a large overlap with the known interactions. In this case, a second step is carried out in which unknown but high scoring and thus promising new interactions are validated ’by hand’ via laboratory experiments. Unfortunately when integrating prior knowledge in the inference procedure, this validation procedure would be biased by using the same information in both the inference and the validation. Therefore, it would no longer allow an independent validation of the resulting network.<p><p>The main contribution of this thesis is a complete computational framework that uses experimental knock down data in a cross-validation scheme to both infer and validate directed networks. Its components are i) a method that integrates genomic data and prior knowledge to infer directed networks, ii) its implementation in an R/Bioconductor package and iii) a web application to retrieve prior knowledge from PubMed abstracts and biological databases. To infer directed networks from genomic data and prior knowledge, we propose a two step procedure: First, we adapt the pairwise feature selection strategy mRMR to integrate prior knowledge in order to obtain the network’s skeleton. Then for the subsequent orientation phase of the algorithm, we extend a criterion based on interaction information to include prior knowledge. The implementation of this method is available both as part of the prior retrieval tool Predictive Networks and as a stand-alone R/Bioconductor package named predictionet.<p><p>Furthermore, we propose a fully data-driven quantitative validation of such directed networks using experimental knock-down data: We start by identifying the set of genes that was truly affected by the perturbation experiment. The rationale of our validation procedure is that these truly affected genes should also be part of the perturbed gene’s childhood in the inferred network. Consequently, we can compute a performance score === Doctorat en Sciences === info:eu-repo/semantics/nonPublished
author2 Bontempi, Gianluca
author_facet Bontempi, Gianluca
Olsen, Catharina
author Olsen, Catharina
author_sort Olsen, Catharina
title Causal inference and prior integration in bioinformatics using information theory
title_short Causal inference and prior integration in bioinformatics using information theory
title_full Causal inference and prior integration in bioinformatics using information theory
title_fullStr Causal inference and prior integration in bioinformatics using information theory
title_full_unstemmed Causal inference and prior integration in bioinformatics using information theory
title_sort causal inference and prior integration in bioinformatics using information theory
publisher Universite Libre de Bruxelles
publishDate 2013
url http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209401
work_keys_str_mv AT olsencatharina causalinferenceandpriorintegrationinbioinformaticsusinginformationtheory
_version_ 1718628518479265792