Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications

Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an importa...

Full description

Bibliographic Details
Main Authors: Hoseung Song, Jayaraman J. Thiagarajan, Bhavya Kailkhura
Format: Article
Language:English
Published: Frontiers Media S.A. 2021-05-01
Series:Frontiers in Artificial Intelligence
Subjects:
Online Access:https://www.frontiersin.org/articles/10.3389/frai.2021.589632/full
id doaj-4f1b055e2cac4543a1f888652ce1dcbe
record_format Article
spelling doaj-4f1b055e2cac4543a1f888652ce1dcbe2021-06-10T10:55:36ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122021-05-01410.3389/frai.2021.589632589632Preventing Failures by Dataset Shift Detection in Safety-Critical Graph ApplicationsHoseung Song0Jayaraman J. Thiagarajan1Bhavya Kailkhura2Department of Statistics, University of California, Davis, CA, United StatesLawrence Livermore National Laboratory, Livermore, CA, United StatesLawrence Livermore National Laboratory, Livermore, CA, United StatesDataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes.https://www.frontiersin.org/articles/10.3389/frai.2021.589632/fullgraph learningdataset shiftsafetytwo-sample testingrandom graph models
collection DOAJ
language English
format Article
sources DOAJ
author Hoseung Song
Jayaraman J. Thiagarajan
Bhavya Kailkhura
spellingShingle Hoseung Song
Jayaraman J. Thiagarajan
Bhavya Kailkhura
Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
Frontiers in Artificial Intelligence
graph learning
dataset shift
safety
two-sample testing
random graph models
author_facet Hoseung Song
Jayaraman J. Thiagarajan
Bhavya Kailkhura
author_sort Hoseung Song
title Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title_short Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title_full Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title_fullStr Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title_full_unstemmed Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
title_sort preventing failures by dataset shift detection in safety-critical graph applications
publisher Frontiers Media S.A.
series Frontiers in Artificial Intelligence
issn 2624-8212
publishDate 2021-05-01
description Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes.
topic graph learning
dataset shift
safety
two-sample testing
random graph models
url https://www.frontiersin.org/articles/10.3389/frai.2021.589632/full
work_keys_str_mv AT hoseungsong preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications
AT jayaramanjthiagarajan preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications
AT bhavyakailkhura preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications
_version_ 1721385022097719296