Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-su...
Main Author: | |
---|---|
Other Authors: | |
Format: | Others |
Language: | English |
Published: |
Massachusetts Institute of Technology
2016
|
Subjects: | |
Online Access: | http://hdl.handle.net/1721.1/105953 |
id |
ndltd-MIT-oai-dspace.mit.edu-1721.1-105953 |
---|---|
record_format |
oai_dc |
collection |
NDLTD |
language |
English |
format |
Others
|
sources |
NDLTD |
topic |
Electrical Engineering and Computer Science. |
spellingShingle |
Electrical Engineering and Computer Science. Yen, Angela Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns |
description |
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. === This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. === Cataloged from student-submitted PDF version of thesis. === Includes bibliographical references (pages 203-225). === One of the fundamental aims of biology is to determine what lies at the root of differences across individuals, species, diseases, and cell types. Furthermore, the sequencing of genomes has revolutionized the ways in which scientists can investigate biological processes and disease pathways; new genome-wide, high-throughput experiments require computer scientists with a biological understanding to analyze and interpret the data to improve our understanding about life science. This provides us with a key opportunity to use computational techniques for new biological discoveries. While genetic variation plays an important role in influence phenotype, sequence alone cannot account for all differences: for example, different types of cells in an individual have varying function and attributes, but identical genetic makeup. This highlights the importance of studying epigenetic changes, which are dynamic chemical changes to and around the DNA. While the DNA of every cell in an individual is the same, the epigenetic context for that DNA varies from cell to cell. In this way, these epigenetic differences play a crucial role in gene regulation, with epigenetic changes both causing and recording regulatory mechanisms. In this thesis, we combine the power of computational, statistical, and data science approaches with the new wave of epigenetic data at a genome-wide level in a number of ways. First, in chapter 2, we demonstrate the importance of computational analysis at an epigenomic level by identifying an epigenomic signature of the olfactory receptor gene family that gives insight into the mechanism behind monogenic gene regulation. Next, in chapter 3, we explain our development of ChromDiff, a novel statistical and information theoretic computational methodology to identify chromatin state differences in groups of samples. In our methodology, we use correction for external covariates to isolate the relevant signal, and as a result, we find that our method outperforms existing computational methods, with further validation through randomized simulations. In chapter 4, we apply our methodology to characteristics including sex, developmental age, and tissue type, we unveil relevant chromatin states and genes that distinguish the groups of epigenomes, with further validation of our results through differential expression analysis and gene set enrichment. In chapter 5, we show the power of integrative analysis through the combination of DNA methylation data with chromatin state profiles, cell types, sample groups, experimental technologies, and histone mark data to reveal insightful epigenetic patterns and relationships. Finally, in chapter 6, we identify "hidden" or "unknown" covariates in epigenomic data by using agnostic principal component analysis on our samples to discover similarities between our known covariates and the identified components. In summation, our research highlights the importance of both algorithm development and method application for epigenomic questions, reaffirming the importance of interdisciplinary research that brings together cutting-edge techniques in computer science with appropriate biological hypotheses and data. While questions and analysis must be carefully paired in an informed manner to produce meaningful, interpretable, and believable results in computational biology, our work here provides a sampling of the vast potential for scientific discovery at the intersection of the fields of computer science and biology. === by Angela Yen. === Ph. D. |
author2 |
Manolis Kellis. |
author_facet |
Manolis Kellis. Yen, Angela |
author |
Yen, Angela |
author_sort |
Yen, Angela |
title |
Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns |
title_short |
Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns |
title_full |
Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns |
title_fullStr |
Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns |
title_full_unstemmed |
Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns |
title_sort |
computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns |
publisher |
Massachusetts Institute of Technology |
publishDate |
2016 |
url |
http://hdl.handle.net/1721.1/105953 |
work_keys_str_mv |
AT yenangela computationalepigenomicsgeneregulationcomparativemethodologiesandepigeneticpatterns |
_version_ |
1719030397404184576 |
spelling |
ndltd-MIT-oai-dspace.mit.edu-1721.1-1059532019-05-02T15:54:18Z Computational epigenomics : gene regulation, comparative methodologies, and epigenetic patterns Yen, Angela Manolis Kellis. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science. Electrical Engineering and Computer Science. Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016. This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections. Cataloged from student-submitted PDF version of thesis. Includes bibliographical references (pages 203-225). One of the fundamental aims of biology is to determine what lies at the root of differences across individuals, species, diseases, and cell types. Furthermore, the sequencing of genomes has revolutionized the ways in which scientists can investigate biological processes and disease pathways; new genome-wide, high-throughput experiments require computer scientists with a biological understanding to analyze and interpret the data to improve our understanding about life science. This provides us with a key opportunity to use computational techniques for new biological discoveries. While genetic variation plays an important role in influence phenotype, sequence alone cannot account for all differences: for example, different types of cells in an individual have varying function and attributes, but identical genetic makeup. This highlights the importance of studying epigenetic changes, which are dynamic chemical changes to and around the DNA. While the DNA of every cell in an individual is the same, the epigenetic context for that DNA varies from cell to cell. In this way, these epigenetic differences play a crucial role in gene regulation, with epigenetic changes both causing and recording regulatory mechanisms. In this thesis, we combine the power of computational, statistical, and data science approaches with the new wave of epigenetic data at a genome-wide level in a number of ways. First, in chapter 2, we demonstrate the importance of computational analysis at an epigenomic level by identifying an epigenomic signature of the olfactory receptor gene family that gives insight into the mechanism behind monogenic gene regulation. Next, in chapter 3, we explain our development of ChromDiff, a novel statistical and information theoretic computational methodology to identify chromatin state differences in groups of samples. In our methodology, we use correction for external covariates to isolate the relevant signal, and as a result, we find that our method outperforms existing computational methods, with further validation through randomized simulations. In chapter 4, we apply our methodology to characteristics including sex, developmental age, and tissue type, we unveil relevant chromatin states and genes that distinguish the groups of epigenomes, with further validation of our results through differential expression analysis and gene set enrichment. In chapter 5, we show the power of integrative analysis through the combination of DNA methylation data with chromatin state profiles, cell types, sample groups, experimental technologies, and histone mark data to reveal insightful epigenetic patterns and relationships. Finally, in chapter 6, we identify "hidden" or "unknown" covariates in epigenomic data by using agnostic principal component analysis on our samples to discover similarities between our known covariates and the identified components. In summation, our research highlights the importance of both algorithm development and method application for epigenomic questions, reaffirming the importance of interdisciplinary research that brings together cutting-edge techniques in computer science with appropriate biological hypotheses and data. While questions and analysis must be carefully paired in an informed manner to produce meaningful, interpretable, and believable results in computational biology, our work here provides a sampling of the vast potential for scientific discovery at the intersection of the fields of computer science and biology. by Angela Yen. Ph. D. 2016-12-22T15:16:16Z 2016-12-22T15:16:16Z 2016 2016 Thesis http://hdl.handle.net/1721.1/105953 965386185 eng M.I.T. theses are protected by copyright. They may be viewed from this source for any purpose, but reproduction or distribution in any format is prohibited without written permission. See provided URL for inquiries about permission. http://dspace.mit.edu/handle/1721.1/7582 225 pages application/pdf Massachusetts Institute of Technology |