id ndltd-OhioLink-oai-etd.ohiolink.edu-ucin1563873297599047
record_format oai_dc
spelling ndltd-OhioLink-oai-etd.ohiolink.edu-ucin15638732975990472021-08-16T05:10:40Z De novo Population Discovery from Complex Biological Datasets Venkatasubramanian, Meenakshi Computer Science Clustering Alternative Splicing Bioinformatics Non-Negative Matrix Factorization Data Mining Community Detection Over the past decade, numerous clustering approaches have been developed and applied to gene expression studies for the unsupervised detection of sub-populations that inform disease prognosis, treatment and mechanism. For example, in diverse cancers, the identification of novel patient subtypes from gene expression can highlight novel therapeutic pathways and cooperating mutations. In addition to the measurement of transcriptional activity from genes, modern high-throughput sequencing technologies enable the sensitive detection of higher-resolution features including alternative splicing, RNA-editing and chromatin modifications. The detection of such features presents a number of computational challenges, due in large part to the sparse nature of that data, high dimensionality (hundreds of thousands of features) and presence of both broad and exceedingly rare molecular/genetic subtypes that are overlapping. In this dissertation, I describe the development of a series of novel methodologies to address these computational challenges that aim to uncover the hidden heterogeneity within complex molecular datasets. The first of these algorithms, splice-ICGS, provides an automated and accurate solution for the detection of complex overlapping splicing-defined subtypes, from large bulk RNA-sequencing datasets. Our solution required the introduction of several key innovations including new methods for sparse matrix filtering, correlation-based feature prioritization, iterative sparse-NMF analysis and a new strategy for multi-label classification. I demonstrate the improved performance of this approach in multiple clinical cancer datasets with an emphasis on Leukemia. To improve our understanding of the causal nature of such known and novel splicing subtypes, I further have developed several downstream analysis tools that can predict causal regulators from splicing subtypes in an automated manner (Bridger, RBP-Finder). These unsupervised approaches were further adapted to solve a distinct problem in the field of single-cell RNA-Sequencing analysis; improved unsupervised detection of common and rare cell populations from ultra-large studies of hundreds of thousands of cells. With these new algorithms in hand, the genomics research community will be presented with novel opportunities for therapeutic target identification, patient classification from splicing data and the delineation of novel cell populations in healthy tissues and disease. 2019-10-01 English text University of Cincinnati / OhioLINK http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047 http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047 unrestricted This thesis or dissertation is protected by copyright: some rights reserved. It is licensed for use under a Creative Commons license. Specific terms and permissions are available from this document's record in the OhioLINK ETD Center.
collection NDLTD
language English
sources NDLTD
topic Computer Science
Clustering
Alternative Splicing
Bioinformatics
Non-Negative Matrix Factorization
Data Mining
Community Detection
spellingShingle Computer Science
Clustering
Alternative Splicing
Bioinformatics
Non-Negative Matrix Factorization
Data Mining
Community Detection
Venkatasubramanian, Meenakshi
De novo Population Discovery from Complex Biological Datasets
author Venkatasubramanian, Meenakshi
author_facet Venkatasubramanian, Meenakshi
author_sort Venkatasubramanian, Meenakshi
title De novo Population Discovery from Complex Biological Datasets
title_short De novo Population Discovery from Complex Biological Datasets
title_full De novo Population Discovery from Complex Biological Datasets
title_fullStr De novo Population Discovery from Complex Biological Datasets
title_full_unstemmed De novo Population Discovery from Complex Biological Datasets
title_sort de novo population discovery from complex biological datasets
publisher University of Cincinnati / OhioLINK
publishDate 2019
url http://rave.ohiolink.edu/etdc/view?acc_num=ucin1563873297599047
work_keys_str_mv AT venkatasubramanianmeenakshi denovopopulationdiscoveryfromcomplexbiologicaldatasets
_version_ 1719460076373147648