De novo Population Discovery from Complex Biological Datasets
Main Author: | |
---|---|
Language: | English |
Published: |
University of Cincinnati / OhioLINK
2019
|
Subjects: | |
Online Access: | http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047 |
id |
ndltd-OhioLink-oai-etd.ohiolink.edu-ucin1563873297599047 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-OhioLink-oai-etd.ohiolink.edu-ucin15638732975990472021-08-16T05:10:40Z De novo Population Discovery from Complex Biological Datasets Venkatasubramanian, Meenakshi Computer Science Clustering Alternative Splicing Bioinformatics Non-Negative Matrix Factorization Data Mining Community Detection Over the past decade, numerous clustering approaches have been developed and applied to gene expression studies for the unsupervised detection of sub-populations that inform disease prognosis, treatment and mechanism. For example, in diverse cancers, the identification of novel patient subtypes from gene expression can highlight novel therapeutic pathways and cooperating mutations. In addition to the measurement of transcriptional activity from genes, modern high-throughput sequencing technologies enable the sensitive detection of higher-resolution features including alternative splicing, RNA-editing and chromatin modifications. The detection of such features presents a number of computational challenges, due in large part to the sparse nature of that data, high dimensionality (hundreds of thousands of features) and presence of both broad and exceedingly rare molecular/genetic subtypes that are overlapping. In this dissertation, I describe the development of a series of novel methodologies to address these computational challenges that aim to uncover the hidden heterogeneity within complex molecular datasets. The first of these algorithms, splice-ICGS, provides an automated and accurate solution for the detection of complex overlapping splicing-defined subtypes, from large bulk RNA-sequencing datasets. Our solution required the introduction of several key innovations including new methods for sparse matrix filtering, correlation-based feature prioritization, iterative sparse-NMF analysis and a new strategy for multi-label classification. I demonstrate the improved performance of this approach in multiple clinical cancer datasets with an emphasis on Leukemia. To improve our understanding of the causal nature of such known and novel splicing subtypes, I further have developed several downstream analysis tools that can predict causal regulators from splicing subtypes in an automated manner (Bridger, RBP-Finder). These unsupervised approaches were further adapted to solve a distinct problem in the field of single-cell RNA-Sequencing analysis; improved unsupervised detection of common and rare cell populations from ultra-large studies of hundreds of thousands of cells. With these new algorithms in hand, the genomics research community will be presented with novel opportunities for therapeutic target identification, patient classification from splicing data and the delineation of novel cell populations in healthy tissues and disease. 2019-10-01 English text University of Cincinnati / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047 http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047 unrestricted This thesis or dissertation is protected by copyright: some rights reserved. It is licensed for use under a Creative Commons license. Specific terms and permissions are available from this document's record in the OhioLINK ETD Center. |
collection |
NDLTD |
language |
English |
sources |
NDLTD |
topic |
Computer Science Clustering Alternative Splicing Bioinformatics Non-Negative Matrix Factorization Data Mining Community Detection |
spellingShingle |
Computer Science Clustering Alternative Splicing Bioinformatics Non-Negative Matrix Factorization Data Mining Community Detection Venkatasubramanian, Meenakshi De novo Population Discovery from Complex Biological Datasets |
author |
Venkatasubramanian, Meenakshi |
author_facet |
Venkatasubramanian, Meenakshi |
author_sort |
Venkatasubramanian, Meenakshi |
title |
De novo Population Discovery from Complex Biological Datasets |
title_short |
De novo Population Discovery from Complex Biological Datasets |
title_full |
De novo Population Discovery from Complex Biological Datasets |
title_fullStr |
De novo Population Discovery from Complex Biological Datasets |
title_full_unstemmed |
De novo Population Discovery from Complex Biological Datasets |
title_sort |
de novo population discovery from complex biological datasets |
publisher |
University of Cincinnati / OhioLINK |
publishDate |
2019 |
url |
http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047 |
work_keys_str_mv |
AT venkatasubramanianmeenakshi denovopopulationdiscoveryfromcomplexbiologicaldatasets |
_version_ |
1719460076373147648 |