Structured clustering representations and methods

Rather than designing focused experiments to test individual hypotheses, scientists now commonly acquire measurements using massively parallel techniques, for post hoc interrogation. The resulting data is both high-dimensional and structured, in that observed variables are grouped and ordered into r...

Full description

Bibliographic Details
Main Author: Heilbut, Adrian Mark
Language:en_US
Published: 2016
Subjects:
Online Access:https://hdl.handle.net/2144/17054
id ndltd-bu.edu-oai-open.bu.edu-2144-17054
record_format oai_dc
spelling ndltd-bu.edu-oai-open.bu.edu-2144-170542019-05-01T03:11:10Z Structured clustering representations and methods Heilbut, Adrian Mark Bioinformatics Timeseries Clustering Gene expression Structured clustering representations Huntington's disease Parkinson's disease Rather than designing focused experiments to test individual hypotheses, scientists now commonly acquire measurements using massively parallel techniques, for post hoc interrogation. The resulting data is both high-dimensional and structured, in that observed variables are grouped and ordered into related subspaces, reflecting both natural physical organization and factorial experimental designs. Such structure encodes critical constraints and clues to interpretation, but typical unsupervised learning methods assume exchangeability and fail to account adequately for the structure of data in a flexible and interpretable way. In this thesis, I develop computational methods for exploratory analysis of structured high-dimensional data, and apply them to study gene expression regulation in Parkinson’s (PD) and Huntington’s diseases (HD). BOMBASTIC (Block-Organized, Model-Based, Tree-Indexed Clustering) is a methodology to cluster and visualize data organized in pre-specified subspaces, by combining independent clusterings of blocks into hierarchies. BOMBASTIC provides a formal specification of the block-clustering problem and a modular implementation that facilitates integration, visualization, and comparison of diverse datasets and rapid exploration of alternative analyses. These tools, along with standard methods, were applied to study gene expression in mouse models of neurodegenerative diseases, in collaboration with Dr. Myriam Heiman and Dr. Robert Fenster. In PD, I analyzed cell-type-specific expression following levodopa treatment to study mechanisms underlying levodopa-induced dyskinesia (LID). I identified likely regulators of the transcriptional changes leading to LID and implicated signaling pathways amenable to pharmacological modulation (Heiman, Heilbut et al, 2014). In HD, I analyzed multiple mouse models (Kuhn, 2007), cell-type specific profiles of medium spiny neurons (Fenster, 2011), and an RNA-Seq dataset profiling multiple tissue types over time and across an mHTT allelic series (CHDI, 2015). I found evidence suggesting that altered activity of the PRC2 complex significantly contributes to the transcriptional dysregulation observed in striatal neurons in HD. 2016-07-14T15:40:23Z 2016-07-14T15:40:23Z 2016 2016-06-21T19:35:28Z Thesis/Dissertation https://hdl.handle.net/2144/17054 en_US Attribution-NonCommercial-NoDerivatives 4.0 International http://creativecommons.org/licenses/by-nc-nd/4.0/
collection NDLTD
language en_US
sources NDLTD
topic Bioinformatics
Timeseries
Clustering
Gene expression
Structured clustering representations
Huntington's disease
Parkinson's disease
spellingShingle Bioinformatics
Timeseries
Clustering
Gene expression
Structured clustering representations
Huntington's disease
Parkinson's disease
Heilbut, Adrian Mark
Structured clustering representations and methods
description Rather than designing focused experiments to test individual hypotheses, scientists now commonly acquire measurements using massively parallel techniques, for post hoc interrogation. The resulting data is both high-dimensional and structured, in that observed variables are grouped and ordered into related subspaces, reflecting both natural physical organization and factorial experimental designs. Such structure encodes critical constraints and clues to interpretation, but typical unsupervised learning methods assume exchangeability and fail to account adequately for the structure of data in a flexible and interpretable way. In this thesis, I develop computational methods for exploratory analysis of structured high-dimensional data, and apply them to study gene expression regulation in Parkinson’s (PD) and Huntington’s diseases (HD). BOMBASTIC (Block-Organized, Model-Based, Tree-Indexed Clustering) is a methodology to cluster and visualize data organized in pre-specified subspaces, by combining independent clusterings of blocks into hierarchies. BOMBASTIC provides a formal specification of the block-clustering problem and a modular implementation that facilitates integration, visualization, and comparison of diverse datasets and rapid exploration of alternative analyses. These tools, along with standard methods, were applied to study gene expression in mouse models of neurodegenerative diseases, in collaboration with Dr. Myriam Heiman and Dr. Robert Fenster. In PD, I analyzed cell-type-specific expression following levodopa treatment to study mechanisms underlying levodopa-induced dyskinesia (LID). I identified likely regulators of the transcriptional changes leading to LID and implicated signaling pathways amenable to pharmacological modulation (Heiman, Heilbut et al, 2014). In HD, I analyzed multiple mouse models (Kuhn, 2007), cell-type specific profiles of medium spiny neurons (Fenster, 2011), and an RNA-Seq dataset profiling multiple tissue types over time and across an mHTT allelic series (CHDI, 2015). I found evidence suggesting that altered activity of the PRC2 complex significantly contributes to the transcriptional dysregulation observed in striatal neurons in HD.
author Heilbut, Adrian Mark
author_facet Heilbut, Adrian Mark
author_sort Heilbut, Adrian Mark
title Structured clustering representations and methods
title_short Structured clustering representations and methods
title_full Structured clustering representations and methods
title_fullStr Structured clustering representations and methods
title_full_unstemmed Structured clustering representations and methods
title_sort structured clustering representations and methods
publishDate 2016
url https://hdl.handle.net/2144/17054
work_keys_str_mv AT heilbutadrianmark structuredclusteringrepresentationsandmethods
_version_ 1719021206605135872