Delineation of chromatin states and transcription factor binding in mouse and tools for large-scale data integration

The goal of the ENCODE project has been to characterize regulatory elements in the human genome, such as regions bound by transcription factors (TFs), regions of open chromatin and regions with altered histone modifications. The ENCODE consortium has performed a large number of whole-genome experime...

Full description

Bibliographic Details
Main Author: van der Velde, Arjan Geert
Other Authors: Weng, Zhiping
Language:en_US
Published: 2019
Subjects:
Online Access:https://hdl.handle.net/2144/37979
Description
Summary:The goal of the ENCODE project has been to characterize regulatory elements in the human genome, such as regions bound by transcription factors (TFs), regions of open chromatin and regions with altered histone modifications. The ENCODE consortium has performed a large number of whole-genome experiments to measure TF binding, chromatin accessibility, gene expression and histone modifications, on a multitude of cell types and conditions in both human and mouse. In this dissertation I describe the analysis of numerous datasets comprising 66 epigenomes, chromatin accessibility and expression data across twelve tissues and seven time points, during mouse embryonic development. We defined chromatin states using histone modification data and performed integrative analysis on the states. We observed coordinated changes of histone mark signals at enhancers and promoters with gene expression. We detected evolutionary conserved bivalent promoters, selectively silencing ~3,400 genes, including hundreds of TFs regulating embryonic development. Second, I present a supervised method to predict TF binding across cell types, with features based on DNA sequence and patterns in DNase I cleavage data. We found that sequence and DNase read counts can outperform other features as well as state-of-the-art methods. I also describe our contribution to the ENCODE TF Binding DREAM challenge where we developed a method, using multiscale features and Extreme Boosting. Third, I describe methods, tools, and computational infrastructure that we have developed to handle large amounts of experimental data and metadata. These tools are fundamental to the selection and integration of large experimental datasets and are at the core of our pipelines, which are described in this dissertation. Finally, I present the protein docking server I developed, as well as algorithms and routines for post-processing predictions and protein structures. Collectively, this body of work encompasses computational approaches to the analyses of chromatin states, gene regulation, and the integration of large experimental datasets. === 2021-08-31T00:00:00Z