Statistical solutions for multiple networks

Networks are quickly becoming one of the most common data types across diverse disciplines from the biological to the social sciences. Consequently, the study of networks as data objects is fundamental to developing statistical methodology for answering complex scientific questions. In this disserta...

Full description

Bibliographic Details
Main Author: Josephs, Nathaniel
Other Authors: Kolaczyk, Eric D.
Language:en_US
Published: 2021
Subjects:
Online Access:https://hdl.handle.net/2144/43216
id ndltd-bu.edu-oai-open.bu.edu-2144-43216
record_format oai_dc
spelling ndltd-bu.edu-oai-open.bu.edu-2144-432162021-10-28T05:01:25Z Statistical solutions for multiple networks Josephs, Nathaniel Kolaczyk, Eric D. Statistics Networks are quickly becoming one of the most common data types across diverse disciplines from the biological to the social sciences. Consequently, the study of networks as data objects is fundamental to developing statistical methodology for answering complex scientific questions. In this dissertation, we provide statistical solutions to three tasks related to multiple networks. We first consider the task of prediction given a collection of observed networks. In particular, we provide a Bayesian approach to performing classification, anomaly detection, and survival analysis with network inputs. Our methodology is based on encoding networks as pairwise differences in the kernel of a Gaussian process prior and we are motivated by the goal of predicting preterm delivery using individual microbiome networks. We next consider the task of exploring reaction space in high-throughput chemistry, where the inputs to a reaction are two or more molecules. Our goal is to create a workflow that facilitates quick, low-cost, and effective analysis of reactions. In order to operationalize this goal, we develop a statistical approach that breaks the analysis into several steps based on four unique challenges that we identify. Each of these challenges requires careful consideration in creating our analysis plan. For instance, to address the fact that reactions are run on multiwell plates, we formulate our proposal as a constrained optimization problem; then, we leverage the underlying structure by realizing a plate as a bipartite graph, which allows us to reformulate the problem as a maximal edge biclique problem. These solutions are necessary to optimally navigate a large reaction space given limited resources, which is critical in the application of reaction chemistry, for example, to drug discovery. The final task we consider is the recovery of a network given a sample of noisy unlabeled copies of the network. Toward this end, we make a connection between the noisy network literature and the correlated Erdős–Rényi graph model, which allows us to employ results from graph matching. Research on multiple unlabeled networks has otherwise been underdeveloped but is emerging in areas such as differential privacy and anonymized networks, as well as measurement error in network construction. 2022-10-25T00:00:00Z 2021-10-26T12:55:00Z 2021 2021-10-26T01:02:53Z Thesis/Dissertation https://hdl.handle.net/2144/43216 0000-0001-5658-6934 en_US Attribution 4.0 International http://creativecommons.org/licenses/by/4.0/
collection NDLTD
language en_US
sources NDLTD
topic Statistics
spellingShingle Statistics
Josephs, Nathaniel
Statistical solutions for multiple networks
description Networks are quickly becoming one of the most common data types across diverse disciplines from the biological to the social sciences. Consequently, the study of networks as data objects is fundamental to developing statistical methodology for answering complex scientific questions. In this dissertation, we provide statistical solutions to three tasks related to multiple networks. We first consider the task of prediction given a collection of observed networks. In particular, we provide a Bayesian approach to performing classification, anomaly detection, and survival analysis with network inputs. Our methodology is based on encoding networks as pairwise differences in the kernel of a Gaussian process prior and we are motivated by the goal of predicting preterm delivery using individual microbiome networks. We next consider the task of exploring reaction space in high-throughput chemistry, where the inputs to a reaction are two or more molecules. Our goal is to create a workflow that facilitates quick, low-cost, and effective analysis of reactions. In order to operationalize this goal, we develop a statistical approach that breaks the analysis into several steps based on four unique challenges that we identify. Each of these challenges requires careful consideration in creating our analysis plan. For instance, to address the fact that reactions are run on multiwell plates, we formulate our proposal as a constrained optimization problem; then, we leverage the underlying structure by realizing a plate as a bipartite graph, which allows us to reformulate the problem as a maximal edge biclique problem. These solutions are necessary to optimally navigate a large reaction space given limited resources, which is critical in the application of reaction chemistry, for example, to drug discovery. The final task we consider is the recovery of a network given a sample of noisy unlabeled copies of the network. Toward this end, we make a connection between the noisy network literature and the correlated Erdős–Rényi graph model, which allows us to employ results from graph matching. Research on multiple unlabeled networks has otherwise been underdeveloped but is emerging in areas such as differential privacy and anonymized networks, as well as measurement error in network construction. === 2022-10-25T00:00:00Z
author2 Kolaczyk, Eric D.
author_facet Kolaczyk, Eric D.
Josephs, Nathaniel
author Josephs, Nathaniel
author_sort Josephs, Nathaniel
title Statistical solutions for multiple networks
title_short Statistical solutions for multiple networks
title_full Statistical solutions for multiple networks
title_fullStr Statistical solutions for multiple networks
title_full_unstemmed Statistical solutions for multiple networks
title_sort statistical solutions for multiple networks
publishDate 2021
url https://hdl.handle.net/2144/43216
work_keys_str_mv AT josephsnathaniel statisticalsolutionsformultiplenetworks
_version_ 1719491496934113280