Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications
Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an importa...
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Frontiers Media S.A.
2021-05-01
|
Series: | Frontiers in Artificial Intelligence |
Subjects: | |
Online Access: | https://www.frontiersin.org/articles/10.3389/frai.2021.589632/full |
id |
doaj-4f1b055e2cac4543a1f888652ce1dcbe |
---|---|
record_format |
Article |
spelling |
doaj-4f1b055e2cac4543a1f888652ce1dcbe2021-06-10T10:55:36ZengFrontiers Media S.A.Frontiers in Artificial Intelligence2624-82122021-05-01410.3389/frai.2021.589632589632Preventing Failures by Dataset Shift Detection in Safety-Critical Graph ApplicationsHoseung Song0Jayaraman J. Thiagarajan1Bhavya Kailkhura2Department of Statistics, University of California, Davis, CA, United StatesLawrence Livermore National Laboratory, Livermore, CA, United StatesLawrence Livermore National Laboratory, Livermore, CA, United StatesDataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes.https://www.frontiersin.org/articles/10.3389/frai.2021.589632/fullgraph learningdataset shiftsafetytwo-sample testingrandom graph models |
collection |
DOAJ |
language |
English |
format |
Article |
sources |
DOAJ |
author |
Hoseung Song Jayaraman J. Thiagarajan Bhavya Kailkhura |
spellingShingle |
Hoseung Song Jayaraman J. Thiagarajan Bhavya Kailkhura Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications Frontiers in Artificial Intelligence graph learning dataset shift safety two-sample testing random graph models |
author_facet |
Hoseung Song Jayaraman J. Thiagarajan Bhavya Kailkhura |
author_sort |
Hoseung Song |
title |
Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title_short |
Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title_full |
Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title_fullStr |
Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title_full_unstemmed |
Preventing Failures by Dataset Shift Detection in Safety-Critical Graph Applications |
title_sort |
preventing failures by dataset shift detection in safety-critical graph applications |
publisher |
Frontiers Media S.A. |
series |
Frontiers in Artificial Intelligence |
issn |
2624-8212 |
publishDate |
2021-05-01 |
description |
Dataset shift refers to the problem where the input data distribution may change over time (e.g., between training and test stages). Since this can be a critical bottleneck in several safety-critical applications such as healthcare, drug-discovery, etc., dataset shift detection has become an important research issue in machine learning. Though several existing efforts have focused on image/video data, applications with graph-structured data have not received sufficient attention. Therefore, in this paper, we investigate the problem of detecting shifts in graph structured data through the lens of statistical hypothesis testing. Specifically, we propose a practical two-sample test based approach for shift detection in large-scale graph structured data. Our approach is very flexible in that it is suitable for both undirected and directed graphs, and eliminates the need for equal sample sizes. Using empirical studies, we demonstrate the effectiveness of the proposed test in detecting dataset shifts. We also corroborate these findings using real-world datasets, characterized by directed graphs and a large number of nodes. |
topic |
graph learning dataset shift safety two-sample testing random graph models |
url |
https://www.frontiersin.org/articles/10.3389/frai.2021.589632/full |
work_keys_str_mv |
AT hoseungsong preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications AT jayaramanjthiagarajan preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications AT bhavyakailkhura preventingfailuresbydatasetshiftdetectioninsafetycriticalgraphapplications |
_version_ |
1721385022097719296 |