Causal inference and prior integration in bioinformatics using information theory
An important problem in bioinformatics is the reconstruction of gene regulatory networks from expression data. The analysis of genomic data stemming from high- throughput technologies such as microarray experiments or RNA-sequencing faces several difficulties. The first major issue is the high varia...
Main Author: | |
---|---|
Other Authors: | |
Format: | Doctoral Thesis |
Language: | fr |
Published: |
Universite Libre de Bruxelles
2013
|
Subjects: | |
Online Access: | http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209401 |
id |
ndltd-ulb.ac.be-oai-dipot.ulb.ac.be-2013-209401 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-ulb.ac.be-oai-dipot.ulb.ac.be-2013-2094012018-04-11T17:33:48Z info:eu-repo/semantics/doctoralThesis info:ulb-repo/semantics/doctoralThesis info:ulb-repo/semantics/openurl/vlink-dissertation Causal inference and prior integration in bioinformatics using information theory Olsen, Catharina Bontempi, Gianluca Lenaerts, Tom Meyer, Patrick E. Quackenbush, John Geurts, Pierre Haibe-Kains, Benjamin Jansen, Maarten Universite Libre de Bruxelles Université libre de Bruxelles, Faculté des Sciences – Informatique, Bruxelles 2013-10-17 fr An important problem in bioinformatics is the reconstruction of gene regulatory networks from expression data. The analysis of genomic data stemming from high- throughput technologies such as microarray experiments or RNA-sequencing faces several difficulties. The first major issue is the high variable to sample ratio which is due to a number of factors: a single experiment captures all genes while the number of experiments is restricted by the experiment’s cost, time and patient cohort size. The second problem is that these data sets typically exhibit high amounts of noise.<p><p>Another important problem in bioinformatics is the question of how the inferred networks’ quality can be evaluated. The current best practice is a two step procedure. In the first step, the highest scoring interactions are compared to known interactions stored in biological databases. The inferred networks passes this quality assessment if there is a large overlap with the known interactions. In this case, a second step is carried out in which unknown but high scoring and thus promising new interactions are validated ’by hand’ via laboratory experiments. Unfortunately when integrating prior knowledge in the inference procedure, this validation procedure would be biased by using the same information in both the inference and the validation. Therefore, it would no longer allow an independent validation of the resulting network.<p><p>The main contribution of this thesis is a complete computational framework that uses experimental knock down data in a cross-validation scheme to both infer and validate directed networks. Its components are i) a method that integrates genomic data and prior knowledge to infer directed networks, ii) its implementation in an R/Bioconductor package and iii) a web application to retrieve prior knowledge from PubMed abstracts and biological databases. To infer directed networks from genomic data and prior knowledge, we propose a two step procedure: First, we adapt the pairwise feature selection strategy mRMR to integrate prior knowledge in order to obtain the network’s skeleton. Then for the subsequent orientation phase of the algorithm, we extend a criterion based on interaction information to include prior knowledge. The implementation of this method is available both as part of the prior retrieval tool Predictive Networks and as a stand-alone R/Bioconductor package named predictionet.<p><p>Furthermore, we propose a fully data-driven quantitative validation of such directed networks using experimental knock-down data: We start by identifying the set of genes that was truly affected by the perturbation experiment. The rationale of our validation procedure is that these truly affected genes should also be part of the perturbed gene’s childhood in the inferred network. Consequently, we can compute a performance score Informatique générale Sciences exactes et naturelles Colon (Anatomy) -- Cancer Bioinformatics Information theory Cancer colorectal Bio-informatique Théorie de l'information bioinformatics prior integration causal inference machine learning 1 v. (xvi, 197 p.) Doctorat en Sciences info:eu-repo/semantics/nonPublished local/bictel.ulb.ac.be:ULBetd-10162013-104610 local/ulbcat.ulb.ac.be:994817 http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209401 No full-text files |
collection |
NDLTD |
language |
fr |
format |
Doctoral Thesis |
sources |
NDLTD |
topic |
Informatique générale Sciences exactes et naturelles Colon (Anatomy) -- Cancer Bioinformatics Information theory Cancer colorectal Bio-informatique Théorie de l'information bioinformatics prior integration causal inference machine learning |
spellingShingle |
Informatique générale Sciences exactes et naturelles Colon (Anatomy) -- Cancer Bioinformatics Information theory Cancer colorectal Bio-informatique Théorie de l'information bioinformatics prior integration causal inference machine learning Olsen, Catharina Causal inference and prior integration in bioinformatics using information theory |
description |
An important problem in bioinformatics is the reconstruction of gene regulatory networks from expression data. The analysis of genomic data stemming from high- throughput technologies such as microarray experiments or RNA-sequencing faces several difficulties. The first major issue is the high variable to sample ratio which is due to a number of factors: a single experiment captures all genes while the number of experiments is restricted by the experiment’s cost, time and patient cohort size. The second problem is that these data sets typically exhibit high amounts of noise.<p><p>Another important problem in bioinformatics is the question of how the inferred networks’ quality can be evaluated. The current best practice is a two step procedure. In the first step, the highest scoring interactions are compared to known interactions stored in biological databases. The inferred networks passes this quality assessment if there is a large overlap with the known interactions. In this case, a second step is carried out in which unknown but high scoring and thus promising new interactions are validated ’by hand’ via laboratory experiments. Unfortunately when integrating prior knowledge in the inference procedure, this validation procedure would be biased by using the same information in both the inference and the validation. Therefore, it would no longer allow an independent validation of the resulting network.<p><p>The main contribution of this thesis is a complete computational framework that uses experimental knock down data in a cross-validation scheme to both infer and validate directed networks. Its components are i) a method that integrates genomic data and prior knowledge to infer directed networks, ii) its implementation in an R/Bioconductor package and iii) a web application to retrieve prior knowledge from PubMed abstracts and biological databases. To infer directed networks from genomic data and prior knowledge, we propose a two step procedure: First, we adapt the pairwise feature selection strategy mRMR to integrate prior knowledge in order to obtain the network’s skeleton. Then for the subsequent orientation phase of the algorithm, we extend a criterion based on interaction information to include prior knowledge. The implementation of this method is available both as part of the prior retrieval tool Predictive Networks and as a stand-alone R/Bioconductor package named predictionet.<p><p>Furthermore, we propose a fully data-driven quantitative validation of such directed networks using experimental knock-down data: We start by identifying the set of genes that was truly affected by the perturbation experiment. The rationale of our validation procedure is that these truly affected genes should also be part of the perturbed gene’s childhood in the inferred network. Consequently, we can compute a performance score === Doctorat en Sciences === info:eu-repo/semantics/nonPublished |
author2 |
Bontempi, Gianluca |
author_facet |
Bontempi, Gianluca Olsen, Catharina |
author |
Olsen, Catharina |
author_sort |
Olsen, Catharina |
title |
Causal inference and prior integration in bioinformatics using information theory |
title_short |
Causal inference and prior integration in bioinformatics using information theory |
title_full |
Causal inference and prior integration in bioinformatics using information theory |
title_fullStr |
Causal inference and prior integration in bioinformatics using information theory |
title_full_unstemmed |
Causal inference and prior integration in bioinformatics using information theory |
title_sort |
causal inference and prior integration in bioinformatics using information theory |
publisher |
Universite Libre de Bruxelles |
publishDate |
2013 |
url |
http://hdl.handle.net/2013/ULB-DIPOT:oai:dipot.ulb.ac.be:2013/209401 |
work_keys_str_mv |
AT olsencatharina causalinferenceandpriorintegrationinbioinformaticsusinginformationtheory |
_version_ |
1718628518479265792 |