An information theoretic treatment of sequence-to-expression modeling.

Studying a gene's regulatory mechanisms is a tedious process that involves identification of candidate regulators by transcription factor (TF) knockout or over-expression experiments, delineation of enhancers by reporter assays, and demonstration of direct TF influence by site mutagenesis, amon...

Full description

Bibliographic Details
Main Authors: Farzaneh Khajouei, Saurabh Sinha
Format: Article
Language:English
Published: Public Library of Science (PLoS) 2018-09-01
Series:PLoS Computational Biology
Online Access:http://europepmc.org/articles/PMC6175532?pdf=render
id doaj-e3cc3ee9d4374f2682304a7f9d6102f0
record_format Article
spelling doaj-e3cc3ee9d4374f2682304a7f9d6102f02020-11-25T01:34:03ZengPublic Library of Science (PLoS)PLoS Computational Biology1553-734X1553-73582018-09-01149e100645910.1371/journal.pcbi.1006459An information theoretic treatment of sequence-to-expression modeling.Farzaneh KhajoueiSaurabh SinhaStudying a gene's regulatory mechanisms is a tedious process that involves identification of candidate regulators by transcription factor (TF) knockout or over-expression experiments, delineation of enhancers by reporter assays, and demonstration of direct TF influence by site mutagenesis, among other approaches. Such experiments are often chosen based on the biologist's intuition, from several testable hypotheses. We pursue the goal of making this process systematic by using ideas from information theory to reason about experiments in gene regulation, in the hope of ultimately enabling rigorous experiment design strategies. For this, we make use of a state-of-the-art mathematical model of gene expression, which provides a way to formalize our current knowledge of cis- as well as trans- regulatory mechanisms of a gene. Ambiguities in such knowledge can be expressed as uncertainties in the model, which we capture formally by building an ensemble of plausible models that fit the existing data and defining a probability distribution over the ensemble. We then characterize the impact of a new experiment on our understanding of the gene's regulation based on how the ensemble of plausible models and its probability distribution changes when challenged with results from that experiment. This allows us to assess the 'value' of the experiment retroactively as the reduction in entropy of the distribution (information gain) resulting from the experiment's results. We fully formalize this novel approach to reasoning about gene regulation experiments and use it to evaluate a variety of perturbation experiments on two developmental genes of D. melanogaster. We also provide objective and 'biologist-friendly' descriptions of the information gained from each such experiment. The rigorously defined information theoretic approaches presented here can be used in the future to formulate systematic strategies for experiment design pertaining to studies of gene regulatory mechanisms.http://europepmc.org/articles/PMC6175532?pdf=render
collection DOAJ
language English
format Article
sources DOAJ
author Farzaneh Khajouei
Saurabh Sinha
spellingShingle Farzaneh Khajouei
Saurabh Sinha
An information theoretic treatment of sequence-to-expression modeling.
PLoS Computational Biology
author_facet Farzaneh Khajouei
Saurabh Sinha
author_sort Farzaneh Khajouei
title An information theoretic treatment of sequence-to-expression modeling.
title_short An information theoretic treatment of sequence-to-expression modeling.
title_full An information theoretic treatment of sequence-to-expression modeling.
title_fullStr An information theoretic treatment of sequence-to-expression modeling.
title_full_unstemmed An information theoretic treatment of sequence-to-expression modeling.
title_sort information theoretic treatment of sequence-to-expression modeling.
publisher Public Library of Science (PLoS)
series PLoS Computational Biology
issn 1553-734X
1553-7358
publishDate 2018-09-01
description Studying a gene's regulatory mechanisms is a tedious process that involves identification of candidate regulators by transcription factor (TF) knockout or over-expression experiments, delineation of enhancers by reporter assays, and demonstration of direct TF influence by site mutagenesis, among other approaches. Such experiments are often chosen based on the biologist's intuition, from several testable hypotheses. We pursue the goal of making this process systematic by using ideas from information theory to reason about experiments in gene regulation, in the hope of ultimately enabling rigorous experiment design strategies. For this, we make use of a state-of-the-art mathematical model of gene expression, which provides a way to formalize our current knowledge of cis- as well as trans- regulatory mechanisms of a gene. Ambiguities in such knowledge can be expressed as uncertainties in the model, which we capture formally by building an ensemble of plausible models that fit the existing data and defining a probability distribution over the ensemble. We then characterize the impact of a new experiment on our understanding of the gene's regulation based on how the ensemble of plausible models and its probability distribution changes when challenged with results from that experiment. This allows us to assess the 'value' of the experiment retroactively as the reduction in entropy of the distribution (information gain) resulting from the experiment's results. We fully formalize this novel approach to reasoning about gene regulation experiments and use it to evaluate a variety of perturbation experiments on two developmental genes of D. melanogaster. We also provide objective and 'biologist-friendly' descriptions of the information gained from each such experiment. The rigorously defined information theoretic approaches presented here can be used in the future to formulate systematic strategies for experiment design pertaining to studies of gene regulatory mechanisms.
url http://europepmc.org/articles/PMC6175532?pdf=render
work_keys_str_mv AT farzanehkhajouei aninformationtheoretictreatmentofsequencetoexpressionmodeling
AT saurabhsinha aninformationtheoretictreatmentofsequencetoexpressionmodeling
AT farzanehkhajouei informationtheoretictreatmentofsequencetoexpressionmodeling
AT saurabhsinha informationtheoretictreatmentofsequencetoexpressionmodeling
_version_ 1725074085026725888