Bayesian Modeling of Complex High-Dimensional Data

With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional complex data in different forms, such as medical images, genomics measurements. However, acquisition of more data does not automatically lead to better knowledge discovery. One needs effici...

Full description

Bibliographic Details
Main Author: Huo, Shuning
Other Authors: Statistics
Format: Others
Published: Virginia Tech 2020
Subjects:
Online Access:http://hdl.handle.net/10919/101037
id ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-101037
record_format oai_dc
spelling ndltd-VTETD-oai-vtechworks.lib.vt.edu-10919-1010372020-12-10T05:33:41Z Bayesian Modeling of Complex High-Dimensional Data Huo, Shuning Statistics Zhu, Hongxiao Deng, Xinwei Kim, Inyoung Gramacy, Robert B. Variational Inference Bayesian Variable Selection Functional Mixed Model Parallel Computing Bayesian Hierarchical Clustering Dirichlet Diffusion Tree With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional complex data in different forms, such as medical images, genomics measurements. However, acquisition of more data does not automatically lead to better knowledge discovery. One needs efficient and reliable analytical tools to extract useful information from complex datasets. The main objective of this dissertation is to develop innovative Bayesian methodologies to enable effective and efficient knowledge discovery from complex high-dimensional data. It contains two parts—the development of computationally efficient functional mixed models and the modeling of data heterogeneity via Dirichlet Diffusion Tree. The first part focuses on tackling the computational bottleneck in Bayesian functional mixed models. We propose a computational framework called variational functional mixed model (VFMM). This new method facilitates efficient data compression and high-performance computing in basis space. We also propose a new multiple testing procedure in basis space, which can be used to detect significant local regions. The effectiveness of the proposed model is demonstrated through two datasets, a mass spectrometry dataset in a cancer study and a neuroimaging dataset in an Alzheimer's disease study. The second part is about modeling data heterogeneity by using Dirichlet Diffusion Trees. We propose a Bayesian latent tree model that incorporates covariates of subjects to characterize the heterogeneity and uncover the latent tree structure underlying data. This innovative model may reveal the hierarchical evolution process through branch structures and estimate systematic differences between groups of samples. We demonstrate the effectiveness of the model through the simulation study and a brain tumor real data. Doctor of Philosophy With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional data in different forms, such as engineering signals, medical images, and genomics measurements. However, acquisition of such data does not automatically lead to efficient knowledge discovery. The main objective of this dissertation is to develop novel Bayesian methods to extract useful knowledge from complex high-dimensional data. It has two parts—the development of an ultra-fast functional mixed model and the modeling of data heterogeneity via Dirichlet Diffusion Trees. The first part focuses on developing approximate Bayesian methods in functional mixed models to estimate parameters and detect significant regions. Two datasets demonstrate the effectiveness of proposed method—a mass spectrometry dataset in a cancer study and a neuroimaging dataset in an Alzheimer's disease study. The second part focuses on modeling data heterogeneity via Dirichlet Diffusion Trees. The method helps uncover the underlying hierarchical tree structures and estimate systematic differences between the group of samples. We demonstrate the effectiveness of the method through the brain tumor imaging data. 2020-12-08T09:00:20Z 2020-12-08T09:00:20Z 2020-12-07 Dissertation vt_gsexam:28197 http://hdl.handle.net/10919/101037 In Copyright http://rightsstatements.org/vocab/InC/1.0/ ETD application/pdf Virginia Tech
collection NDLTD
format Others
sources NDLTD
topic Variational Inference
Bayesian Variable Selection
Functional Mixed Model
Parallel Computing
Bayesian Hierarchical Clustering
Dirichlet Diffusion Tree
spellingShingle Variational Inference
Bayesian Variable Selection
Functional Mixed Model
Parallel Computing
Bayesian Hierarchical Clustering
Dirichlet Diffusion Tree
Huo, Shuning
Bayesian Modeling of Complex High-Dimensional Data
description With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional complex data in different forms, such as medical images, genomics measurements. However, acquisition of more data does not automatically lead to better knowledge discovery. One needs efficient and reliable analytical tools to extract useful information from complex datasets. The main objective of this dissertation is to develop innovative Bayesian methodologies to enable effective and efficient knowledge discovery from complex high-dimensional data. It contains two parts—the development of computationally efficient functional mixed models and the modeling of data heterogeneity via Dirichlet Diffusion Tree. The first part focuses on tackling the computational bottleneck in Bayesian functional mixed models. We propose a computational framework called variational functional mixed model (VFMM). This new method facilitates efficient data compression and high-performance computing in basis space. We also propose a new multiple testing procedure in basis space, which can be used to detect significant local regions. The effectiveness of the proposed model is demonstrated through two datasets, a mass spectrometry dataset in a cancer study and a neuroimaging dataset in an Alzheimer's disease study. The second part is about modeling data heterogeneity by using Dirichlet Diffusion Trees. We propose a Bayesian latent tree model that incorporates covariates of subjects to characterize the heterogeneity and uncover the latent tree structure underlying data. This innovative model may reveal the hierarchical evolution process through branch structures and estimate systematic differences between groups of samples. We demonstrate the effectiveness of the model through the simulation study and a brain tumor real data. === Doctor of Philosophy === With the rapid development of modern high-throughput technologies, scientists can now collect high-dimensional data in different forms, such as engineering signals, medical images, and genomics measurements. However, acquisition of such data does not automatically lead to efficient knowledge discovery. The main objective of this dissertation is to develop novel Bayesian methods to extract useful knowledge from complex high-dimensional data. It has two parts—the development of an ultra-fast functional mixed model and the modeling of data heterogeneity via Dirichlet Diffusion Trees. The first part focuses on developing approximate Bayesian methods in functional mixed models to estimate parameters and detect significant regions. Two datasets demonstrate the effectiveness of proposed method—a mass spectrometry dataset in a cancer study and a neuroimaging dataset in an Alzheimer's disease study. The second part focuses on modeling data heterogeneity via Dirichlet Diffusion Trees. The method helps uncover the underlying hierarchical tree structures and estimate systematic differences between the group of samples. We demonstrate the effectiveness of the method through the brain tumor imaging data.
author2 Statistics
author_facet Statistics
Huo, Shuning
author Huo, Shuning
author_sort Huo, Shuning
title Bayesian Modeling of Complex High-Dimensional Data
title_short Bayesian Modeling of Complex High-Dimensional Data
title_full Bayesian Modeling of Complex High-Dimensional Data
title_fullStr Bayesian Modeling of Complex High-Dimensional Data
title_full_unstemmed Bayesian Modeling of Complex High-Dimensional Data
title_sort bayesian modeling of complex high-dimensional data
publisher Virginia Tech
publishDate 2020
url http://hdl.handle.net/10919/101037
work_keys_str_mv AT huoshuning bayesianmodelingofcomplexhighdimensionaldata
_version_ 1719370077598384128