Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns

Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-su...

Full description

Bibliographic Details
Main Author: Yen, Angela
Other Authors: Manolis Kellis.
Format: Others
Language:English
Published: Massachusetts Institute of Technology 2016
Subjects:
Online Access:http://hdl.handle.net/1721.1/105953
id ndltd-MIT-oai-dspace.mit.edu-1721.1-105953
record_format oai_dc
collection NDLTD
language English
format Others
sources NDLTD
topic Electrical Engineering and Computer Science.
spellingShingle Electrical Engineering and Computer Science.
Yen, Angela
Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns
description Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-submitted PDF version of thesis. === Includes bibliographical references (pages 203-225). === One of the fundamental aims of biology is to determine what lies at the root of differences across individuals, species, diseases, and cell types. Furthermore, the sequencing of genomes has revolutionized the ways in which scientists can investigate biological processes and disease pathways; new genome-wide, high-throughput experiments require computer scientists with a biological understanding to analyze and interpret the data to improve our understanding about life science. This provides us with a key opportunity to use computational techniques for new biological discoveries. While genetic variation plays an important role in influence phenotype, sequence alone cannot account for all differences: for example, different types of cells in an individual have varying function and attributes, but identical genetic makeup. This highlights the importance of studying epigenetic changes, which are dynamic chemical changes to and around the DNA. While the DNA of every cell in an individual is the same, the epigenetic context for that DNA varies from cell to cell. In this way, these epigenetic differences play a crucial role in gene regulation, with epigenetic changes both causing and recording regulatory mechanisms. In this thesis, we combine the power of computational, statistical, and data science approaches with the new wave of epigenetic data at a genome-wide level in a number of ways. First, in chapter 2, we demonstrate the importance of computational analysis at an epigenomic level by identifying an epigenomic signature of the olfactory receptor gene family that gives insight into the mechanism behind monogenic gene regulation. Next, in chapter 3, we explain our development of ChromDiff, a novel statistical and information theoretic computational methodology to identify chromatin state differences in groups of samples. In our methodology, we use correction for external covariates to isolate the relevant signal, and as a result, we find that our method outperforms existing computational methods, with further validation through randomized simulations. In chapter 4, we apply our methodology to characteristics including sex, developmental age, and tissue type, we unveil relevant chromatin states and genes that distinguish the groups of epigenomes, with further validation of our results through differential expression analysis and gene set enrichment. In chapter 5, we show the power of integrative analysis through the combination of DNA methylation data with chromatin state profiles, cell types, sample groups, experimental technologies, and histone mark data to reveal insightful epigenetic patterns and relationships. Finally, in chapter 6, we identify "hidden" or "unknown" covariates in epigenomic data by using agnostic principal component analysis on our samples to discover similarities between our known covariates and the identified components. In summation, our research highlights the importance of both algorithm development and method application for epigenomic questions, reaffirming the importance of interdisciplinary research that brings together cutting-edge techniques in computer science with appropriate biological hypotheses and data. While questions and analysis must be carefully paired in an informed manner to produce meaningful, interpretable, and believable results in computational biology, our work here provides a sampling of the vast potential for scientific discovery at the intersection of the fields of computer science and biology. === by Angela Yen. === Ph. D.
author2 Manolis Kellis.
author_facet Manolis Kellis.
Yen, Angela
author Yen, Angela
author_sort Yen, Angela
title Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns
title_short Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns
title_full Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns
title_fullStr Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns
title_full_unstemmed Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns
title_sort computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns
publisher Massachusetts Institute of Technology
publishDate 2016
url http://hdl.handle.net/1721.1/105953
work_keys_str_mv AT yenangela computationalepigenomicsgeneregulationcomparativemethodologiesandepigeneticpatterns
_version_ 1719030397404184576
spelling ndltd-MIT-oai-dspace.mit.edu-1721.1-1059532019-05-02T15:54:18Z Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns Yen, Angela Manolis Kellis. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 203-225). One of the fundamental aims of biology is to determine what lies at the root of differences across individuals, species, diseases, and cell types. Furthermore, the sequencing of genomes has revolutionized the ways in which scientists can investigate biological processes and disease pathways; new genome-wide, high-throughput experiments require computer scientists with a biological understanding to analyze and interpret the data to improve our understanding about life science. This provides us with a key opportunity to use computational techniques for new biological discoveries. While genetic variation plays an important role in influence phenotype, sequence alone cannot account for all differences: for example, different types of cells in an individual have varying function and attributes, but identical genetic makeup. This highlights the importance of studying epigenetic changes, which are dynamic chemical changes to and around the DNA. While the DNA of every cell in an individual is the same, the epigenetic context for that DNA varies from cell to cell. In this way, these epigenetic differences play a crucial role in gene regulation, with epigenetic changes both causing and recording regulatory mechanisms. In this thesis, we combine the power of computational, statistical, and data science approaches with the new wave of epigenetic data at a genome-wide level in a number of ways. First, in chapter 2, we demonstrate the importance of computational analysis at an epigenomic level by identifying an epigenomic signature of the olfactory receptor gene family that gives insight into the mechanism behind monogenic gene regulation. Next, in chapter 3, we explain our development of ChromDiff, a novel statistical and information theoretic computational methodology to identify chromatin state differences in groups of samples. In our methodology, we use correction for external covariates to isolate the relevant signal, and as a result, we find that our method outperforms existing computational methods, with further validation through randomized simulations. In chapter 4, we apply our methodology to characteristics including sex, developmental age, and tissue type, we unveil relevant chromatin states and genes that distinguish the groups of epigenomes, with further validation of our results through differential expression analysis and gene set enrichment. In chapter 5, we show the power of integrative analysis through the combination of DNA methylation data with chromatin state profiles, cell types, sample groups, experimental technologies, and histone mark data to reveal insightful epigenetic patterns and relationships. Finally, in chapter 6, we identify "hidden" or "unknown" covariates in epigenomic data by using agnostic principal component analysis on our samples to discover similarities between our known covariates and the identified components. In summation, our research highlights the importance of both algorithm development and method application for epigenomic questions, reaffirming the importance of interdisciplinary research that brings together cutting-edge techniques in computer science with appropriate biological hypotheses and data. While questions and analysis must be carefully paired in an informed manner to produce meaningful, interpretable, and believable results in computational biology, our work here provides a sampling of the vast potential for scientific discovery at the intersection of the fields of computer science and biology. by Angela Yen. Ph. D. 2016-12-22T15:16:16Z 2016-12-22T15:16:16Z 2016 2016 Thesis http://hdl.handle.net/1721.1/105953 965386185 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 225 pages application/pdf Massachusetts Institute of Technology