Statistical solutions for multiple networks
Networks are quickly becoming one of the most common data types across diverse disciplines from the biological to the social sciences. Consequently, the study of networks as data objects is fundamental to developing statistical methodology for answering complex scientific questions. In this disserta...
Main Author: | |
---|---|
Other Authors: | |
Language: | en_US |
Published: |
2021
|
Subjects: | |
Online Access: | https://hdl.handle.net/2144/43216 |
id |
ndltd-bu.edu-oai-open.bu.edu-2144-43216 |
---|---|
record_format |
oai_dc |
spelling |
ndltd-bu.edu-oai-open.bu.edu-2144-432162021-10-28T05:01:25Z Statistical solutions for multiple networks Josephs, Nathaniel Kolaczyk, Eric D. Statistics Networks are quickly becoming one of the most common data types across diverse disciplines from the biological to the social sciences. Consequently, the study of networks as data objects is fundamental to developing statistical methodology for answering complex scientific questions. In this dissertation, we provide statistical solutions to three tasks related to multiple networks. We first consider the task of prediction given a collection of observed networks. In particular, we provide a Bayesian approach to performing classification, anomaly detection, and survival analysis with network inputs. Our methodology is based on encoding networks as pairwise differences in the kernel of a Gaussian process prior and we are motivated by the goal of predicting preterm delivery using individual microbiome networks. We next consider the task of exploring reaction space in high-throughput chemistry, where the inputs to a reaction are two or more molecules. Our goal is to create a workflow that facilitates quick, low-cost, and effective analysis of reactions. In order to operationalize this goal, we develop a statistical approach that breaks the analysis into several steps based on four unique challenges that we identify. Each of these challenges requires careful consideration in creating our analysis plan. For instance, to address the fact that reactions are run on multiwell plates, we formulate our proposal as a constrained optimization problem; then, we leverage the underlying structure by realizing a plate as a bipartite graph, which allows us to reformulate the problem as a maximal edge biclique problem. These solutions are necessary to optimally navigate a large reaction space given limited resources, which is critical in the application of reaction chemistry, for example, to drug discovery. The final task we consider is the recovery of a network given a sample of noisy unlabeled copies of the network. Toward this end, we make a connection between the noisy network literature and the correlated Erdős–Rényi graph model, which allows us to employ results from graph matching. Research on multiple unlabeled networks has otherwise been underdeveloped but is emerging in areas such as differential privacy and anonymized networks, as well as measurement error in network construction. 2022-10-25T00:00:00Z 2021-10-26T12:55:00Z 2021 2021-10-26T01:02:53Z Thesis/Dissertation https://hdl.handle.net/2144/43216 0000-0001-5658-6934 en_US Attribution 4.0 International http://creativecommons.org/licenses/by/4.0/ |
collection |
NDLTD |
language |
en_US |
sources |
NDLTD |
topic |
Statistics |
spellingShingle |
Statistics Josephs, Nathaniel Statistical solutions for multiple networks |
description |
Networks are quickly becoming one of the most common data types across diverse disciplines from the biological to the social sciences. Consequently, the study of networks as data objects is fundamental to developing statistical methodology for answering complex scientific questions. In this dissertation, we provide statistical solutions to three tasks related to multiple networks.
We first consider the task of prediction given a collection of observed networks. In particular, we provide a Bayesian approach to performing classification, anomaly detection, and survival analysis with network inputs. Our methodology is based on encoding networks as pairwise differences in the kernel of a Gaussian process prior and we are motivated by the goal of predicting preterm delivery using individual microbiome networks.
We next consider the task of exploring reaction space in high-throughput chemistry, where the inputs to a reaction are two or more molecules. Our goal is to create a workflow that facilitates quick, low-cost, and effective analysis of reactions. In order to operationalize this goal, we develop a statistical approach that breaks the analysis into several steps based on four unique challenges that we identify. Each of these challenges requires careful consideration in creating our analysis plan. For instance, to address the fact that reactions are run on multiwell plates, we formulate our proposal as a constrained optimization problem; then, we leverage the underlying structure by realizing a plate as a bipartite graph, which allows us to reformulate the problem as a maximal edge biclique problem. These solutions are necessary to optimally navigate a large reaction space given limited resources, which is critical in the application of reaction chemistry, for example, to drug discovery.
The final task we consider is the recovery of a network given a sample of noisy unlabeled copies of the network. Toward this end, we make a connection between the noisy network literature and the correlated Erdős–Rényi graph model, which allows us to employ results from graph matching. Research on multiple unlabeled networks has otherwise been underdeveloped but is emerging in areas such as differential privacy and anonymized networks, as well as measurement error in network construction. === 2022-10-25T00:00:00Z |
author2 |
Kolaczyk, Eric D. |
author_facet |
Kolaczyk, Eric D. Josephs, Nathaniel |
author |
Josephs, Nathaniel |
author_sort |
Josephs, Nathaniel |
title |
Statistical solutions for multiple networks |
title_short |
Statistical solutions for multiple networks |
title_full |
Statistical solutions for multiple networks |
title_fullStr |
Statistical solutions for multiple networks |
title_full_unstemmed |
Statistical solutions for multiple networks |
title_sort |
statistical solutions for multiple networks |
publishDate |
2021 |
url |
https://hdl.handle.net/2144/43216 |
work_keys_str_mv |
AT josephsnathaniel statisticalsolutionsformultiplenetworks |
_version_ |
1719491496934113280 |